Re: What stage should entities be resolved?

Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Mon, 14 Mar 2022 19:43:22 +0100

          From comp.compilers

Related articles
Re: What stage should entities be resolved? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-12)
Re: What stage should entities be resolved? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-14)
Re: What stage should entities be resolved? costello@mitre.org (Roger L Costello) (2022-03-15)
Re: What stage should entities be resolved? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-18)
Re: What stage should entities be resolved? gah4@u.washington.edu (gah4) (2022-03-17)
Re: What stage should entities be resolved? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-03-18)
Re: What stage should entities be resolved? gah4@u.washington.edu (gah4) (2022-03-18)
Re: What stage should entities be resolved? martin@gkc.org.uk (Martin Ward) (2022-03-19)
[1 later articles]
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Date: Mon, 14 Mar 2022 19:43:22 +0100
Organization: Compilers Central
References: 22-03-019 22-03-025 22-03-028
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="56396"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, design
Posted-Date: 14 Mar 2022 14:50:21 EDT

On 3/12/22 1:11 PM, Christopher F Clark wrote:
> Contrary to what might assume from my previous posting on this topic.
> I agree with Dodi.
>
> Sometimes, the right answer is another phase. To keep your lexer
> simple, it can be useful to have a separate phase that deals with
> "character" issues, whether that is transforming UTF-8 extensions into
> unique code points (or actual characters representing glyphs possibly
> accented, i.e. resolving the combining code points into canonical
> versions) or taking sequences like &amp; or \n or whatever into single
> tokens (or characters). That *can* make the whole process simpler and
> faster.


I consider these "phases" as "filters". In my C parser I also had a
number of filter levels that handle the various aspects in detail of the
preprocessor macro substitution and conditional compilation. The parser
calls the top level filter to return the next C token, which in turn
calls lower level filters until all levels returned enough information
about the next token to parse.


A sloppy interpretation by Microsoft of the preprocessor as a
self-contained stage revealed that the newer C standards disallow a
stand-alone C preprocessor. Such a separate preprocessor could
synthesize tokens like "//" that never occured in a strict (embedded) C
standard implementation. Even if this was not stated explicitly in the
standard it turned out as a side effect of the lexer implementation.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.