Re: type or identifier fundamental parsing issue - Need help from parsing experts (Torben Ęgidius Mogensen)
Wed, 11 Jul 2012 12:17:27 +0200

          From comp.compilers

Related articles
type or identifier fundamental parsing issue - Need help from parsing (AD) (2012-07-03)
Re: type or identifier fundamental parsing issue - Need help from pars (George Neuner) (2012-07-04)
Re: type or identifier fundamental parsing issue - Need help from pars (Hans-Peter Diettrich) (2012-07-04)
Re: type or identifier fundamental parsing issue - Need help from pars (2012-07-11)
| List of all articles for this month |

DKIM-Signature: v=1; a=rsa-sha256; c=simple;; h=cc:from:subject:date:sender:message-id:references:mime-version:content-type:vbr-info; s=9175.4ffde3c6.k1207;; bh=zTv7l4mwHWkbzQK3S3GWTOdV+3MZo05oJzPUNIBTGfg=; b=rIDEyWL2+IcWLHT2qlv0a2WxVkrnJVTz/H4A92I1oaVW75q4oJJUiSCiL1wnFXAuTnFQJiNu6rH5m1XW0RQmgfqmds8la/p0PWOi/OSxplRuUqwfFU2pHAmIKywew7LpLE/DOv3sojO3wlO7aO83fKHpZkg0MuUxx4j0CS1klpo=
VBR-Info:; mc=all;
From: (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: Wed, 11 Jul 2012 12:17:27 +0200
Organization: - Supporting Open source
References: 12-07-004 12-07-006
Keywords: parse, practice
Posted-Date: 11 Jul 2012 16:36:21 EDT

George Neuner <> writes:

> There are no hard and fast rules. There typically is efficiency to be
> gained by classifying identifiers as early as is practical, but the
> notion that such classification *must* be done at parse time simply is
> ridiculous. Simply do whatever is most convenient.
> That said, ambiguity such as you describe tends to make a language
> complex and difficult for programmers to understand. This is not
> necessarily a bad thing, but it should be justified by a measurable
> gain in expressive power.

I agree. If making even a partial parse requires classification of
identifiers based on declarations that can be arbitrarily far way,
this is not so much a problem for the compiler (which can in most
cases easily remember all previous declarations and use these to make
this classification) but for the programmer (who can't).

This is a problem in C, where a*b; can be a declaration of b to be a
pointer to a value of type a or an expression statement that
multiplies two values depending on whether a is a type or a variable.
It is even worse in C++, where a<b,c>(d) can be either a call to a
template function or a comma expression consisting of two comparisons.

Even SML, which is otherwise a very clean design that is easy to parse
for humans, the lack of syntactic distinction between variables and
nullary constructors can make it hard to know if a pattern is a binding
instance of a variable or a constructor pattern, so I prefer the Haskell
approach that distinguishes variables and constructors by case: Upper
case indicates constructors and lower case indicate variables.

So my advice is to either make the syntax such that you don't need to
classify identifiers or make the classification local, such as by
upper/lower case, initial letter (like in Fortran), a suffixed $ or %
(like in BASIC) or some other feature of the name.

Similarly, if you allow declaration of infix operators with different
precedences, it can be hard for a reader of a program to parse an
expression without knowing the precedences. If the precedence
declarations can be arbitrarily far away from the expression, this is a
problem for the programmer (but not the compiler). An elegant solution
(IMO) is empliyed by O'Caml: Infix operators are built from a limited
set of symbols and the first symbol in an operator name indicates its
precedence: +=-< has the same precedence as +, <-=+ has the same
precedence as < and so on. So all you need to recall is the precendences
of the standard operators. This can be bad enough if there are dozens
of operators with over a dozen different precedences (like in C++), but
if you keep the number modest, it is no problem. I think Wirth went to
far in restricting the number of precedence levels in Pascal, but
anything over 8 is probably too many.

An IDE can, of course, help a human to parse a program text, but that
only works on a screen and it may take extra time (for the programmer)
to process the information provided by the IDE, which is often in the
form of mouse-pointer information, colouring or matching brackets. So,
ideally, the program text should be easy to parse for a human without
computer assistance.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.