Re: 4GL language design, was Writing a recursive descent parser in C

"Bill Rayer" <lingolanguage@hotmail.com>
11 Dec 2001 21:31:13 -0500

          From comp.compilers

Related articles
Writing a recursive descent parser in C bilbo@volcanomail.com (2001-11-29)
Re: Writing a recursive descent parser in C spinoza1111@yahoo.com (2001-12-03)
Re: Writing a recursive descent parser in C lingolanguage@hotmail.com (Bill Rayer) (2001-12-07)
Re: 4GL language design, was Writing a recursive descent parser in C spinoza1111@yahoo.com (2001-12-09)
Re: 4GL language design, was Writing a recursive descent parser in C alexc@world.std.com (2001-12-11)
Re: 4GL language design, was Writing a recursive descent parser in C lingolanguage@hotmail.com (Bill Rayer) (2001-12-11)
| List of all articles for this month |

From: "Bill Rayer" <lingolanguage@hotmail.com>
Newsgroups: comp.compilers
Date: 11 Dec 2001 21:31:13 -0500
Organization: Virgin Net Usenet Service
References: 01-11-146 01-12-008 01-12-020 01-12-040
Keywords: parse, design, comment
Posted-Date: 11 Dec 2001 21:31:13 EST

Dear Newsgroup


> > I'm interested that some 4GLs mix up the scanning and parsing stages.
> > What 4GLs do you consider to be most deficient in this way? And what
>
> Most have been fortunately dropped from use, but a good example might
> be various flavors of Basic implemented in the 1970s for a range of
> minicomputers. The use of postfix type characters by older Basics and
> as implemented in these products is one confusion of the scanning and
> parsing phases because the handling of the postfix type operator
> belongs in no clear and decidable sense to neither the scanner or the
> parser. [snip]


I did read Kemeny & Kurtz's book "Back to Basic" and understood the
only type character they wanted was $ for string. They disapproved of
the large number of type characters used by other Basics (eg Microsoft
Quickbasic has 6 I can recall). But the type character was always part
of the i/d, it was never intended as a separate symbol.


I was interested in your comments about mixing scannning and parsing
because I'm reading the XML syntax (www.w3.org/TR/REC-xml). Putting
aside XML's merits, I was uneasy reading the syntax as I can't tell
whether it mixes the scanner and the parser or not! I'm used to
syntaxes that work on two levels - you define the tokens ("begin",
"end", identifier, signed_integer etc), then you define the syntax
that says how the tokens fit together (block ::= BEGIN statement ";"
END etc). The tokens are processed in the scanner and the syntax is
represented by the structure of recursive subroutines.


At this point I should add my compiler writing experience is limited
to recursive descent parsers in Pascal and Delphi. As was ably
explained at the start of this thread, it's easy to write a RDP if you
can define a language on two levels: (1) the tokens which are
definable using regular expressions and (2) the syntax using EBNF.
Given this information, the code follows naturally.


What bothers me with XML is having a separate production for space
(production [3]). I always thought if tokens are separated by
whitespace, an EBNF syntax never had to worry about spaces. But XML
specifies tags similar to:


    '<' Name S? '>'


ie an opening pointy bracket followed immediately by a Name production
(similar to a normal identifier), followed by an optional space
production (S, one or more spaces) followed by a closing pointy
bracket.


So by having a separate production for spaces, does XML mix up the
scanning and parsing stages? And does it matter if they do? I would be
interested in anyone's views on this, not least because I'm trying to
modify a parser to work with it!


Regards
Bill Rayer
[Parsing XML is indeed pretty yucky. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.