=?UTF-8?Q?Re:_Bison_determinis=E2=80=8Btic_LALR=281=29?= =?UTF-8?Q?_parser_for_Java/C++_=28kind_of_co?= =?UTF-8?Q?mplex_langauge=29_without_'lexar_h?= =?UTF-8?Q?ack'_support?=

"BartC" <bc@freeuk.com>
Wed, 22 Aug 2012 14:04:20 +0100

From comp.compilers

Related articles
=?UTF-8?Q?Bison_determinis=E2=80=8Btic_LALR=281=29_parser_for_Java=2FC hsad005@gmail.com (2012-08-17)
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U DrDiettrich1@aol.com (Hans-Peter Diettrich) (2012-08-18)
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U anton@mips.complang.tuwien.ac.at (2012-08-20)
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U cr88192@hotmail.com (BGB) (2012-08-21)
*=?UTF-8?Q?Re:_Bison_determinis=E2=80=8Btic_LALR=281=29?= =?UTF-8?Q?_pa bc@freeuk.com (BartC)* (2012-08-22)**
Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?U cr88192@hotmail.com (BGB) (2012-08-26)
Bison deterministic LALR parser for Java/C++ bc@freeuk.com (BartC) (2012-08-29)
speeding up C recompilation, was Re: Bison deterministic LALR cr88192@hotmail.com (BGB) (2012-09-04)
Re: C include handling, was Bison deterministic LALR marcov@toad.stack.nl (Marco van de Voort) (2012-09-05)

| List of all articles for this month |

From:	"BartC" <bc@freeuk.com>
Newsgroups:	comp.compilers
Date:	Wed, 22 Aug 2012 14:04:20 +0100
Organization:	A noiseless patient Spider
References:	12-08-005 12-08-006 12-08-009 12-08-014
Keywords:	parse, performance
Posted-Date:	22 Aug 2012 15:57:42 EDT

"BGB" <cr88192@hotmail.com> wrote in message
> On 8/20/2012 8:35 AM, Anton Ertl wrote:

>> Code generation and optimization do not change the relation between
>> the time spent in scanning and in parsing. Moreover, if the compiler
>> spends most of the time in code generation, optimization and/or
>> "more", there is even less reason to worry about parsing speed.
>
>
> (sorry in advance for sort of wandering off on a tangent...).
>
>
> a major thing that I think skews things so much in the case of C is just
> how much stuff can get pulled in by the preprocessor, the bulk of which
> ends up quickly being discarded by subsequent stages (and much of the
> rest is only minimally processed by later stages).

I think much of that is the fault of the language and over-reliance on the
pre-processor. Instead of relying on neat language features to minimise the
sizes of headers, everything has to be spelled out in a long-winded way.
Lack of proper namespaces (in lists of enumerations for example) means
identifiers are long and unwieldy, which doesn't help.

> if we spend 1000ms preprocessing and parsing the code, and 10ms or 100ms
> compiling it, where has most of the time gone?...

But even given all that, there are ways of dealing with huge header files so
that it is not necessary to repeatedly tokenise and parse the same headers
over and over again (for recompiling the same module, or compiling many
modules all sharing the same headers).

I've no idea whether many C compilers actually bother though; perhaps it's
easier to just recommend a faster computer..

> OTOH, in my scripting language (which doesn't use headers), I have
> generally been looking at millisecond-range compile times.

Well, that's more like it. Compilation time simply shouldn't be an issue,
not for compiling a single module anyway.

And has never been a problem for me, ever, no matter what hardware I was
using, partly thanks to using my own tools.

I only get a delay when there are interdependencies and I just recompile
everything for simplicity. Then I might have to twiddle my thumbs for 5-10
seconds. But then, I now have to rely on some external tools..

> although admittedly most of my compilers have tended to be pretty dumb,
> typically working like (for the upper end):
> split into tokens and parse the code directly into an AST;
> perform basic simplifications on the AST (evaluating constant
> expressions);
> perform basic type-analysis / inference (generally forward propagation
> driven by declarations and assignment operations);
> flatten this out into a bytecode-style format (binary serialization is
> optional).
>
> this is then currently followed by a backend which translates the
> bytecode into threaded-code and runs this (generally also pretty fast,
> and generally functions/methods are translated on call).

I have two current compilers, one native code, and this one for bytecode for
a dynamically typed, non-type-inferred language:

source -> lexer/parser -> types -> code generator -> optim -> binary
bytecode file

The type pass does very little here, mainly checking l-values and reducing
constant expressions. The optim pass does very little too, just reducing
some common bytecode combinations into one. Nevertheless, the resulting
bytecode, even with a straightforward, non JIT-ing interpreter, can make a
basic lexer run at some 3M tokens/second as I mentioned in another post.

The native code compiler is more involved (I hate these ones! But I need one
to implement the interpreter):

source -> lexer/parser -> names -> types -> intermediate code generator ->
target code generator -> optim -> asm source file

The optim stage does some peephole stuff, but haven't gone as far as having
some variables allocated to registers. Last time I checked, it was
perhaps 40-50% slower than gcc-O1, averaged over ~20
integer/floating-point-intensive benchmarks. That might do me.

--
Bartc
[I've seen C compilers that keep preparsed versions of headers. Dunno
what they do with #if. Also see Microsoft's C# and other .NET languages,
that put all of the type info in the objects, so you can use the object
as a compiled include file. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

=?UTF-8?Q?Re:_Bison_determinis=E2=80=8Btic_LALR=281=29?= =?UTF-8?Q?_parser_for_Java/C++_=28kind_of_co?= =?UTF-8?Q?mplex_langauge=29_without_'lexar_h?= =?UTF-8?Q?ack'_support?=

"BartC" <bc@freeuk.com>Wed, 22 Aug 2012 14:04:20 +0100

"BartC" <bc@freeuk.com>
Wed, 22 Aug 2012 14:04:20 +0100