Re: Is lex/yacc the right tool for this problem

arnold@skeeve.com (Aharon Robbins)
27 Jan 2003 23:26:23 -0500

          From comp.compilers

Related articles
Is lex/yacc the right tool for this problem sonyantony@hotmail.com (2003-01-26)
Re: Is lex/yacc the right tool for this problem arnold@skeeve.com (2003-01-27)
Re: Is lex/yacc the right tool for this problem tenger@idirect.com (Terrence Enger) (2003-01-27)
Re: Is lex/yacc the right tool for this problem sonyantony@hotmail.com (2003-01-29)
Re: Is lex/yacc the right tool for this problem sonyantony@hotmail.com (2003-01-29)
Re: Is lex/yacc the right tool for this problem codeworker@free.fr (2003-01-29)
| List of all articles for this month |

From: arnold@skeeve.com (Aharon Robbins)
Newsgroups: comp.compilers
Date: 27 Jan 2003 23:26:23 -0500
Organization: Pioneer Consulting, Ltd.
References: 03-01-163
Keywords: parse
Posted-Date: 27 Jan 2003 23:26:23 EST

Sony Antony <sonyantony@hotmail.com> wrote:
>I have a huge file with lines of the form
>89234758979hfjhkjh39485893475398789576945349856789
>hgdstfh3478567356h45g64674569468457694645u6ui68945
>389478976984596875649864645987597954795879498657
>
>
>( I just typed garbage with the keyboard. But the real data files will
>be similer closely packed digits and alphabets without space,
>signifying different pieces of data like name, time,date, amount etc.)
>Each of these lines are data. The first 3 characters represent the
>type of the line. For each given type, the remaining positions are
>different types of data packed closely without space, in a way
>specific for that type. ( None of the data is encrypted though )
>
>I am required to extract certain fields of data when certain
>conditions are met.
>Typically a set of rules like
>1.If type == 123 && (( amount < 35 ) || ( customer == unknown ) ) =>
>fetch amount, date, duration
>2. If type == 234 && (( weight > 100 ) && ( height < 567 ) ) => fetch
>name, weight, height


You want something that will let you extract the columns into
variables and then do your logic test. You can do this with gawk and
its FIELDWIDTHS variable, something along these lines:


# this rule is run for each input line
{
type = substr($0, 1, 3) # first three chars
type = type + 0 # make numeric
if (type == 123) {
extract1()
logic1()
} else if (type == 234) {
extract2()
logic2()
} # etc...
}


function extract1()
{
FIELDWIDTHS = "3 2 5 7" # whatever
$0 = $0 # force $0 to be reparsed
amount = $2
customer = $3 # assign fields to variables for readability
# ...
}


function logic1()
{
if (amount == 42 && customer == "whatever)
...
}


....


Undoubtedly perl, python or tcl could be used too. Lex & yacc are
likely to be overkill for this job. You could probably even do it in
C using some straightforward sscanf calls on your input line.


Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold@skeeve.com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 928 569 9018
Nof Ayalon Cell Phone: +972 51 297-545
D.N. Shimshon 99785 ISRAEL


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.