Opinions about "epsilon" Symbols in Parse Trees

"Evangelos Drikos" <drikosv@otenet.gr>
9 Jun 2005 15:44:58 -0400

          From comp.compilers

Related articles
Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (Evangelos Drikos) (2005-06-09)
Re: Opinions about "epsilon" Symbols in Parse Trees schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-06-10)
Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-10)
Re: Opinions about "epsilon" Symbols in Parse Trees mefrill@yandex.ru (mefrill) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees news4e71@yahoo.com (0x4e71) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (eDrikos) (2005-06-13)
| List of all articles for this month |

From: "Evangelos Drikos" <drikosv@otenet.gr>
Newsgroups: comp.compilers
Date: 9 Jun 2005 15:44:58 -0400
Organization: An OTEnet S.A. customer
Keywords: parse, question
Posted-Date: 09 Jun 2005 15:44:58 EDT

Hi all,


I'm writing a parser generator that includes an algorithm for
ambiguous grammars but there is a point under question as it doesn't
represent "epsilon" symbols in the Parse Tree. Yet, for ambiguous
grammars the program creates the correct number of parse trees.


So, I would like to have some opinions that will help me to understand
the topic better.


I searched on Yahoo and found a paper on the topic by Marwan Shaban,
"A Hybrid GLR Algorithm for Parsing with Epsilon Grammars". This
refers to "An Efficient Augmented-Context-Free Parsing Algorithm",
written by Masaru Tomita.


As I understood both tried to found ways and represent epsilon symbols
correctly in ambiguous grammars. However these papers didn't help
me. I need a less scientific and more practical help.


To make my question more specific I give two examples below.


The program accepts Syntax Rules of the form:


<integer> ::= [ - | + ] { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 } .


The above mentioned rule means that an integer is a sequence of one or
more digits with an optional sign as prefix.


Apparently, when the parser reads an unsigned integer, it doesn't represent
any "sign" on the Parse Tree.


The Syntax Rule can be restated with the following two Syntax Rules:


(1) <sign> ::= - | + | NONE /* where NONE is the "epsilon" */


(2) <integer> ::= <sign> { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 } .


Also, in this case, if the parser read an unsigned integer it doesn't
represent it on the Parse Tree.


Though, it could (or must) probably reduce the rule "<sign> -> NONE".


It was a straight forward example. But, let us define the Syntax Rule
of an identifier that consists of a <Latin Letter> optionally followed
by a sequence of <Latin Letter> or <digit>.


The e-free Syntax Rule:
"<Identifier> ::= <Latin Letter> [ { <Latin Letter> | <digit> } . ] "
can be restated as:


<Identifier> ::= <Identifier start> <Identifier part>
<Identifier start> ::= <Latin Letter>
<Identifier part> ::= { <Latin Letter> | <digit> | NONE } .
/* where NONE is the "epsilon" */


How many times should the Parser reduce the production "<Identifier
part>-> NONE" when it read an identifier consisting of only one <Latin
Letter>?


None, one and why not four times?


Thanks in advance,


Evangelos Drikos,
Software Engineer (Specializing in SAP; Compilers is a hobby)
Athens, Greece.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.