Re: Best language for implementing compilers?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Mon, 11 Mar 2019 10:49:08 -0700 (PDT)

          From comp.compilers

Related articles
[20 earlier articles]
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-10)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-10)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? gneuner2@comcast.net (George Neuner) (2019-03-10)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-03-11)
Re: Best language for implementing compilers? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-03-12)
Re: Best language for implementing compilers? mertesthomas@gmail.com (2019-03-12)
Re: Best language for implementing compilers? bc@freeuk.com (Bart) (2019-03-13)
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Mon, 11 Mar 2019 10:49:08 -0700 (PDT)
Organization: Compilers Central
References: 19-02-002 19-02-004 19-02-006 19-03-009 19-03-010
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="4857"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, performance
Posted-Date: 12 Mar 2019 00:41:02 EDT
In-Reply-To: 19-03-010

On Sunday, March 10, 2019 at 9:07:41 PM UTC-4, Bart wrote:
> On 10/03/2019 11:13, Christopher F Clark wrote:
> > On Tuesday, February 12, 2019 at 9:43:46 AM UTC-5, Bart wrote:
> >> On 08/02/2019 23:36, George Neuner wrote:
> >>> On Fri, 8 Feb 2019 12:20:18 +0000, "Costello, Roger L."
> >>> <costello@mitre.org> wrote:
> >>>
> >>>> What is it about ML that makes it such a good language for
implementing
> >>>> compilers?
> >>>
> >>> Compiling involves a lot of pattern matching, and pattern matching is
> >>> a native feature of ML.
> >>
> >> You mean for tokenising and parsing? That would be a small part of
> >> compilation (the easy bit, in my view), although it seems to be a
> >> preoccupation of this group.
> >
> > This thread has gone down a rabbit hole. The pattern matching in ML is
not
> > used for tokenizing and parsing.
> ....
>
> > If you want really fast tokenizing and parsing, you turn it into a DFA
(PDA
> > for a parser) and implement the engine for doing so in assembly
> > language/machine code.
>
> How fast are we talking about?


I haven't measured in a long time, so I can't quote any numbers. However, as
I recall, you can lex a buffer in roughly the same time you can access it via
getc rather than reading with fread if your lexer code is tight. In fact, the
fetching of the characters is often a significant factor in the lexing time.
The other significant factors are the time spent in calls (to either the I/O
library or passing a token back to the parser. So, really fast lexers
actually often concentrate on that, minimizing both (e.g. reading large
buffers and batching up a whole set of tokens to pass to the parser rather
than one at a time).


> Which comes to an insignificant amount of the runtime of most compilers.


That probably depends upon the complexity of the other parts. The last time I
benchmarked lexer and parser times was for a compiler that dealt with a
language for industrial automation. In that compiler, before tweaking, the
lexer and parser took up about a third of the total compilation time.
Admittedly the language was very simple. After tweaking we got it down to
around 10% of the time. The only other bit of code in the compiler took up
10% in our profiling runs was the writing out of the resulting compiled file.




> Parsing needs to recognise keywords in order to do its job


I consider recognizing keywords and hashing identifiers both to be part of the
lexer's responsibility, but don't necessarily mean you should do them in the
DFA.


> > I'm talking about what you do afterwards. That's where the bulk of the
> > work is done.
>
> OK. Clearly my compilers work differently, or at least that aspect of it
> is in a very crude form, as I don't recall needing to do any such
> pattern matching.


You probably aren't trying to support multiple front ends and back ends with
the same compiler. You probably aren't attempting much optimization either.
If neither of those applies, you can often make your IR (intermediate
representation) fairly close to the source language AST (abstract syntax
tree). Then, you don't need pattern matching to do rewrites.


That said, see the response that describes how to do code generation as
rewrites. People have been doing that since the 1970s at least. It can make
specifying a code generator easy.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.