|a generated parser/scanner for fixed form FORTRAN email@example.com (Evangelos Drikos) (2013-07-17)|
|From:||Evangelos Drikos <firstname.lastname@example.org>|
|Date:||Wed, 17 Jul 2013 17:55:22 +0300|
|Organization:||An OTEnet S.A. customer|
|Keywords:||parse, Fortran, LALR|
|Posted-Date:||17 Jul 2013 14:07:26 EDT|
The parser/scanner generator "Syntaxis.jar" supports a new feature that
enables a LALR' parser and a generated scanner to parse fixed form
FORTRAN programs and I think that this feature is not FORTRAN specific.
Below, I describe in detail the lexical issues and thereafter the
grammar modifications required in a FORTRAN 2008 free source form LALR'
parser/scanner to parse also fixed form programs. The lexical issues
I've identified are grouped into three categories:
1) Statements that don't have delimiters (e.g. GO TO label, STOP code).
2) Keywords followed by another keyword or a name before a delimiter.
2.1) before '=' (NON; e.g. NON INTRINSIC and NON OVERRIDABLE).
2.2) before '(' (SYNC,IMPLICIT,SUBROUTINE,FUNCTION,DO,CALL,prefix)
2.3) before '%' (CALL or DATA; e.g. CALL a%b )
2.4) before '[' (CODIMENSION; e.g. CODIMENSION a[*] )
3) Special cases.
3.1) The keyword FORMAT (an issue in both fixed & free form).
3.2) The well known DO issue (DO [label] name=exp,exp).
3.3) An integer before a binary-defined-operator.
3.4) An entity declaration that looks like a function statement.
3.5) Hollerith Constants (FORTRAN 77).
At first, we modify the grammar to distinguish the names used at the
beginning of a statement (name-l) from the names used elsewhere (name-l
or name-r). Then we can solve the lexical issues per category:
1) A name-l must be followed by ':','(','%','=','['.
2) For the concatenated keywords/names before '=', '(', '[',and '%' we:
2.1) accept optional spacing; it also needs a semantic action.
2.2-2.4) scan for "]=","%name=", ")="; if not found it is a name-r.
3) For each special case:
3.1) We scan for "]=", "%name=", or ")="; if found it is a name-l.
3.2) If a name-l like "DO*" is followed by "=exp," it must be further
followed by "name=". As expressions are not regular we set a limit of
four levels of nested parentheses for the first expression of the
"loop-control". If the limit is exceeded the lexer returns a name-l.
3.3) A real-literal-constant cannot be followed by a letter or a dot.
3.4) We use a token name-f that begins with FUNCTION/SUBROUTINE and
is followed by '('name-list/dummy-arg-list')'. If the parser can
not shift the keyword FUNCTION, we return a name-l (semantic action).
3.5) We parse Hollerith Constants in the hand coded file reader.
The new feature mentioned above is that the lexer can return a shorter
match as an alternative token. As the parser cannot shift a name-r at
the beginning of a statement it can optionally request a shorter match.
To validate the solution, I've extended the grammar with three obsolete
FORTRAN 77 statements (ASSIGNED GOTO, ASSIGN,and PAUSE) and tested it
with the programs found at: www.itl.nist.gov/div89/ctg/fortran_form.htm
Clearly, the table driven scanner has some disadvantages but one can
mechanically translate it into a hard coded optimized scanner.
Return to the
Search the comp.compilers archives again.