Re: Reachability of DFA part

Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Sat, 21 Dec 2019 19:58:27 +0100

          From comp.compilers

Related articles
Reachability of DFA part borucki.andrzej@gmail.com (Andy) (2019-12-20)
Re: Reachability of DFA part 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-20)
Re: Reachability of DFA part borucki.andrzej@gmail.com (Andy) (2019-12-21)
Re: Reachability of DFA part DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-12-21)
Reachability of DFA part christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-23)
Re: Reachability of DFA part 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-24)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Date: Sat, 21 Dec 2019 19:58:27 +0100
Organization: Compilers Central
References: 19-12-008 19-12-009 19-12-012
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="91817"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, Pascal
Posted-Date: 21 Dec 2019 14:04:56 EST

Am 21.12.2019 um 10:15 schrieb Andy:
> W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
> napisał:


> Machine finally accept all strings begin from "ab" but "ba" will unused.
> This is similar to definition of comment: in Pascal. comment begin at { and
> end of }, careless definition is {*} which mark as comment to rest of file.


Pascal is a too general term, with no special implementation implied.


A Pascal lexer typically is handcrafted or generated by CoCo/r, not with
lex/yacc or regular expressions.




> Good definition would be {[^}]*}
> Complexity of problem increases when comment ends with string len >1, for
> example C: */ or Pascal *)


Digraphs are a problem with many old languages, including C. They are
intended as direct replacements for other characters, conversion
performed before or in tokenization.




> if we renaming : /->a *->b other->c
> then bad definition will ab(a|b|b)*ba and good definition is complicated:
> ab(b|(a|c)*b*)*a (if I not make mistake)
>
> Commments should maybe be defined in other way, especially comments can be
> nested in Object Pascal. Comment nesting can using stack or simply counter.


> I see, in Pascal is using counter. Difference: Pascal has two types of multiline
> comments { } and (* *)


For digraphs see above. Nested comments are problematic only with single
pass lexer generators. With multiple stages for character substitution,
whitespace etc. no problems are known with Pascal tokens.




> If we use stack, closing comment type must be equal last open comment type,
> for counter - only count comments of type first opening, example
> { { (* } *) }


This again is implementation specific.




If you mean scannerless parsers with embedded regular expressions, they
have several problems with traditional languages. Traditional
tokenization has minor known problems with whitespace (including
comments), numbers and identifiers, which have been discussed and solved
since long. All other tokens are literals which deserve no sophisticated
lexer. In Pascal/Delphi even the dot tokens '.', '..', '...' don't cause
problems, unlike C where '..' is missing.




IMO a language should be constructed for easy compilation, with simple
terminal definitions for handcrafted lexers, or with a fully specified
conflict free formal token grammar, the latter type either with a
separate or embedded lexer. The first type also is human readable, while
the second type tends to result in write-only languages or describes
non-verbal grammars like for DNA. What's a formal token definition worth
if it cannot be proofed error free?


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.