Re: LL1 program ???

Jerry Leichter <leichter@smarts.com>
21 Jul 1998 11:14:15 -0400

          From comp.compilers

Related articles
LL1 program ??? Andrea.Palmieri@ii.uam.es (Aulas de =?iso-8859-1?Q?Inform=E1tica?=) (1998-07-01)
Re: LL1 program ??? msrao@india.ti.com (Srikanth M) (1998-07-03)
Re: LL1 program ??? joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-07-03)
Re: LL1 program ??? janaki@csa.iisc.ernet.in (Janaki S) (1998-07-10)
Re: LL1 program ??? napi@cel.mimos.my (1998-07-11)
Re: LL1 program ??? joachim.durchholz@munich.netsurf.de (Joachim Durchholz) (1998-07-13)
Re: LL1 program ??? rod.bates@boeing.com (Rodney M. Bates) (1998-07-20)
Re: LL1 program ??? leichter@smarts.com (Jerry Leichter) (1998-07-21)
Re: LL1 program ??? jamz@my-dejanews.com (1998-07-24)
| List of all articles for this month |

From: Jerry Leichter <leichter@smarts.com>
Newsgroups: comp.compilers
Date: 21 Jul 1998 11:14:15 -0400
Organization: System Management ARTS
References: 98-07-009 98-07-033 98-07-098 98-07-110 98-07-129
Keywords: parse, AST, comment

I don't think you're missing anything. The points you identify are
exactly correct. The easiest way to see this is in the definition you
mention at the start: If, formally, a language is just a subset of the
strings over some alphabet, then a "programming language" isn't a
"language"! If you consider the expression:


a - b - c


in almost any "programming language", it's not just a set of symbols;
there's an implied grouping that says that the expression is to be
evaluated left to right (unless it's actually in APL, in which it's to
be evaluated right to left).


The difference is fundamentally one of semantics, not syntax. At the
least, when we say the evaluation is left-to-right, we're saying that
the two expressions:


a - b - c and ( a - b ) - c


are "equivalent" in some sense. Obviously, they are not equivalent as
character strings! They *are* equivalent as abstract syntax trees, so
AST's in some way capture a small piece of the language's semantics.


Of course, we can continue and say that, while the strings 1 + 1 and 2
are distinct, they are equivalent. This notion of equivalence isn't
captured by AST's, though it often *is* captured within a compiler by
AST's after constant folding - and it's still only the beginning of a
full semantic theory of the language.


Formal language theory deals with both "languages" as sets of strings,
and grammars. Almost all the interesting properties we talk about in
formal language theory are primarily properties of grammars, not of
languages. Thus, grammars, not languages, are LL(1) or ambiguous. The
two levels aren't completely disconnected, since not all languages have
LL(1) grammars (and there are languages all of whose grammars are
ambiguous); but for the most part, the language is "carried along for
the ride" in a theory that is mainly interested in the grammars.


BTW, ambiguity - or lack thereof - is a property of a grammar which ties
it to a bit of semantics: Once you've chosen an unambiguous grammar,
you have a unique AST for any string in the language. Note the levels
of abstraction: First we have a language - a set of strings, with many
possible enumerations, most of them ineffective (in the computational
theory sense). Then we choose to describe it using a grammar. Now we
have an effective procedure for testing membership, which gives us a
witness to membership - a derivation sequence (in general, of course,
*many* derivation sequences). Then we choose a particular unambiguous
grammar. Now each member has a *unique* canonical witness to member-
ship, a derivation tree. Given the unique witness, we can construct an
AST, and then ascribe certain properties to strings - e.g., we can
identify as "equivalent" strings with the same AST. This gives us the
beginning of a semantics, which is in the choice of grammar, not in the
original strings.


Actually, the theory doesn't even "top out" here: The next step is to
"label" the AST's with attributes. There's a well-developed theory of
attribute grammars, which can capture a good deal of the semantics of a
programming language (though there are limits to how far you can go in
*practice* because of the complexity of the general algorithms).


-- Jerry
[Aren't you missing a step where you extract the AST from the concrete
syntax? -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.