Re: DFA Lexer Generation From BNF

idbaxter@semdesigns.com
Sat, 17 Nov 2007 10:11:34 -0800 (PST)

          From comp.compilers

Related articles
Code generation from AST lssilva@gmail.com (Lucas S. Silva) (2007-11-10)
DFA Lexer Generation From BNF pbm@oct.net (Paul B Mann) (2007-11-10)
Re: DFA Lexer Generation From BNF cfc@shell01.TheWorld.com (Chris F Clark) (2007-11-11)
Re: DFA Lexer Generation From BNF gneuner2/@/comcast.net (George Neuner) (2007-11-12)
Re: DFA Lexer Generation From BNF pbm@oct.net (Paul B Mann) (2007-11-16)
Re: DFA Lexer Generation From BNF bobduff@shell01.TheWorld.com (Robert A Duff) (2007-11-16)
Re: DFA Lexer Generation From BNF idbaxter@semdesigns.com (2007-11-17)
| List of all articles for this month |

From: idbaxter@semdesigns.com
Newsgroups: comp.compilers
Date: Sat, 17 Nov 2007 10:11:34 -0800 (PST)
Organization: Compilers Central
References: 07-11-033 07-11-038
Keywords: lex
Posted-Date: 18 Nov 2007 01:08:09 EST

On Nov 10, 11:23 pm, "Paul B Mann" <p...@oct.net> wrote:
> Does anyone know of alexergenerator whose input is aBNFgrammar
> instead of regular expressions ?
>
> Paul Mann
>
> [DFA's aren't adequate to recognizeBNF, which is why parser
> generators use a DFA and a stack or other more powerful machines. I
> suppose you could limit it to aBNFsubset equivalent to regexps, but
> what would be the point? -John]


Another perspective is, "why bother using DFAs?" The usual, and good
argument, is "speed". But with modern computers, there seems to be
less emphasis on the latter, and if you aren't processing enormous
amounts of source, then perhaps convenience is the driver.


The ASF+SDF project (also known as "XT" and shipped as part of the
Stratego program transformation tool) uses "scannerless GLR parsers",
a single unified GLR parsing algorithm to lex and parse. See
www.cs.uu.nl/research/techreps/repo/CS-2005/2005-052.pdf In
particular, if I understand it right, they simply let you write the
grammar using BNF rules all the way down to tokens and whitespace.
This is an extremely clean way to specify a langauge.


But the apparant advantage goes beyond that. In many real world
situations, often one embeds one language into another (SQL in COBOL,
Javascript and PHP in HTML, ...). If one has essentially separate GLR
grammars for each, composing this is close to trivial; you simply add
a rule to the containing langauge that introduces an escape indication
and allows a nonterminal from the contained langauge. My personal
opinion is that this is a truly pretty solution.


If the escapes are carefully designed, you really can't end up with
any ambiguities. Most escapes used in real langauges with embedded
sublanguages already have this property; otherwise they wouldn't be
useful as escapes.


This gets one back to the efficiency vs convenience question. We
build a tool (DMS Software Reengineering Toolkit) that also uses GLR
parsing. But we use conventional lexical definitions because our
interest is in large scale applications, where parsing speed matters.
So we give up some convenience in specification for speed. [Of
course, we can always simply use the GLR parser the character
level. And we'll probably go back someday and look at unifying the two
ideas.]


Ira Baxter, CTO
www.semanticdesigins.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.