Re: Philosophical question regarding statement terminators

Chris F Clark <cfc@world.std.com>
15 Nov 2000 00:04:41 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-09)
Re: Philosophical question regarding statement terminators jthorn@galileo.thp.univie.ac.at (2000-11-09)
Re: Philosophical question regarding statement terminators vbdis@aol.com (2000-11-11)
Re: Philosophical question regarding statement terminators wclodius@aol.com (2000-11-14)
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-14)
Re: Philosophical question regarding statement terminators jerrold.leichter@smarts.com (Jerry Leichter) (2000-11-14)
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-15)
Re: Philosophical question regarding statement terminators vbdis@aol.com (2000-11-17)
Re: Philosophical question regarding statement terminators vbdis@aol.com (2000-11-19)
Re: Philosophical question regarding statement terminators adrian@sartre.cs.rhbnc.ac.uk (A Johnstone) (2000-11-21)
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-25)
| List of all articles for this month |

From: Chris F Clark <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 15 Nov 2000 00:04:41 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 00-11-069 00-11-096
Keywords: syntax
Posted-Date: 15 Nov 2000 00:04:40 EST

For context, I wrote previously:
>To have a language without statement terminators (or statement
>separators) and without line continuations and still having an
>unambiguous grammar, one must have distinct statement starting tokens
>that can be recognized as starting a new statement (rather than
>continuing the list).


To which William B. Clodius asked:


> Is this really true?
>
> In some of the languages I am familiar with it appears that a
> substantial subset could be parsed into statements using a rule such
> as:
>
> If a name is identified (preferably, but not necessarily, after an end
> of line) such that no open brackets remain, and the name is not
> immediately preceded by an operator or punctuation then a new
> statement begins.


In the languages you are familiar with, two identifiers in a row
cannot be considered part of a list, therefore it fulfills the
conditions I was mentioning. The 2nd sequential identifier can be
recognized as not continuing the "list" (i.e. expression) of the
previous statement and thus, must begin a new statement (or be a
syntax error).


Not, to continue picking on BASIC, but it seemed a good example for
this point (showing lists that can be extended). One of the lists
that occurs in that language is the list of "print items" in a PRINT
statement. These are expressions separated by commas (or semicolons)
and potentially ending with either an expression or one of the
separators. The following statements are all legal PRINT statements.


100 PRINT "hello world"
110 PRINT "i = "; i
120 PRINT "what to do next?";


The key difficulty in removing the line-numbers and end-of-line
dependency from BASIC is that when you come to a separator (comma or
semicolon) you don't know if the next token is part of the print
items or the start of a distinct statement. For instance, look at the
next example in a "BASIC" derivative that removes the line-numbers and
end-of-line rules. Is the "i" after the semicolon part of the PRINT
statement or the start of a new LET statement? Only upon seeing the =
is one sure that it is the 2nd case (and not even then if the dialect
has = as an expression operator).


PRINT "i =" ;
i
= 10


Now, let's look at removing semicolons from a C derived language.
First, your case "a = b c = d" is unambiguous in a C dialect without
semicolons, since C does not have an expression with two identifiers
in sequence with no intervening operator. So far, so good. However,
it just takes one ambiguous case to mess it up.


Having seen in BASIC the problem type of list, it is now easy to
construct a problem case for our C dialect. In C, the problem is any
expression operator that is legal as both a prefix (or suffix) and an
infix operator. Three operators (*, -, and &) come to mind readily.
Thus, the following fragment is ambiguous (unless we have some other
rule to resolve it) in our C dialect.


a = b * c = d /* does this parse as "a = b; *c = d;"
or "a = (b * (c = d));" */


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.