Dynamic tokens & syntax

Patrick Herring <ph@anweald.exnet.co.uk>
Fri, 15 Apr 1994 14:40:41 GMT

          From comp.compilers

Related articles
Dynamic tokens & syntax ph@anweald.exnet.co.uk (Patrick Herring) (1994-04-15)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Patrick Herring <ph@anweald.exnet.co.uk>
Keywords: parse, question, comment
Organization: Anweald Systems
Date: Fri, 15 Apr 1994 14:40:41 GMT

I'm trying to write (just to see if I can) a lexer/parser for a Rexx
interpreter that can deal with in-line instructions that change tokens and
syntax. This is motivated in part by ANSI currently standardising Rexx.
ANSI will want to define the symbol character set internationally, also it
would be nice to fix unusual tab chars by adding to the 'blank' token, and
it would be nice to add new syntax and be able to provide a fix against
breaking existing programs.


The Rexx language has an OPTIONS statement that provides for
implementation-specific commands to the interpreter from within the
program source code. I want to be able to do:


          say ' not ok in symbols' /* is the gbp symbol BTW */
          options Symbol_char ''
          call it_has__in_it /* label ::= symbol : BTW */
          options NoSymbol_char ''
          say ' not ok in symbols'
          exit


    it_has__in_it:
          say ' currently ok in symbols'
          return


and the like. In other words the OPTIONS statements are interpreted in the
run-time order, and they affect the lexing and parsing of normal code,
which must still be parsed ahead up to a point so you can resolve calls to
internal labels, and of course actually to move forward according to the
unchanged/standard/default syntax.


I've been thinking that the way to do this is to have delayed parsing. An
'unknown' token is lexed as 'not any other token' which the parser can use
to flag a clause as 'parse not complete'. Then there can be an up-front
lex/parse pass (for efficiency and to get internal labels), together with
re-parsing of unfinished clauses in the run-time order when the language
definition for that clause is fully known. I may have a problem with
recognising line continuations eg ,/t/n is one if /t is a BlankChar but
you won't know that until run-time unless it's standard/default (Rexx has
line continuations (,) and clause delimiters (;) and a lot of
context-dependent semantics so to get internal labels you need to get
whole clauses). Not yet sure how to do dynamically changing syntax though
obviously ambiguity is a bit of a problem.


I'm posting in case somebody can tell me it's an established fact that
this is impossible for any language (or necessarily too inefficient). So
far it doesn't seem to be but I'm self-taught from the Dragon book so I
may well have missed something. Sound too ambitious? Been done before?


Yours, Patrick
[This sort of peekahead is a well-known technique. Nearly 25 years ago IBM's
Assembler H for the 360/370 series needed to be able to scan ahead through
the input to look for forward definitions of macros. I believe that it
tokenized but deferred any sort of syntax check. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.