Re: Modularize compiler construction?

"BGB / cr88192" <>
Sun, 24 Jan 2010 20:44:48 -0700

          From comp.compilers

Related articles
Modularize compiler construction? (Peng Yu) (2010-01-23)
Re: Modularize compiler construction? (Kaz Kylheku) (2010-01-24)
Re: Modularize compiler construction? (BGB / cr88192) (2010-01-24)
Re: Modularize compiler construction? (cr88192) (2010-01-25)
Re: Modularize compiler construction? (Hans-Peter Diettrich) (2010-01-25)
Re: Modularize compiler construction? (Peng Yu) (2010-01-25)
Re: Modularize compiler construction? (Ira Baxter) (2010-01-28)
Re: Modularize compiler construction? (George Neuner) (2010-01-28)
Re: Modularize compiler construction? (Matthias-Christian Ott) (2010-01-31)
[3 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Sun, 24 Jan 2010 20:44:48 -0700
References: 10-01-080
Keywords: design
Posted-Date: 25 Jan 2010 00:31:16 EST

"Peng Yu" <> wrote in message
> It seems that the current compiler construction tools (at least in
> bison and flex) are still very primitive. ...

> I'm wondering if there are any on going research on this topic.
> [In fairness, bison and flex are based on designs from the 1970s.
> There's plenty of newer tools available to anyone who looks for
> them. -John]


However, Splitting A Language Or Compiler Into A Number Of Modules Is A
General Step In The Right Direction.

For example, one can take the AST's from the parser, and use them for
different sorts of tasks:
producing IL to send off to the lower-end;
gathering information from the source code to be used for producing
standalone metadata, doing analysis, ...

as well, one can have a general-purpose assembler, which they can feed
ASM originating from different sources: the main compiler; isolated
special-purpose code generators; user-supplied hand-optimized or
special-purpose code fragments; ...

as well, a low-level code generator which may be used for multiple
input languages, ...

The great weak point as I see it is that there is so little
standardization in all this. often with a compiler, people only care
about the particular implementation (say, MSVC vs GCC vs LLVM, ...)
rather than how it works: providing consistent de-facto
representations for different pieces of data, as well as
general-purpose APIs which can be used with multiple possible

FWIW, the existence of things like ELF, COFF, and textual ASM, x86 machine
code, ... are just as valuable as the implementations which produce and
accept them.

consider how sad it would be if everything from the HLL to the CPU was all
controlled by the same company, and was an entirely closed system focused
simply around "the needs of end users".

hence, as I see it, it would be better if these core technologies were
better liberated from the "tryany" of particular implementations. but, alas,
this also requires that people more consider the components and how they
work and fit together, rather than simply thinking in terms of the
particular implementation.

as an example, one can note the difference between, say, MS's implementation
of the .NET framework, and ECMA-335. the implementation is needed, yes,
however, the standard is a similarly powerful tool:
it allows people besides MS to implement it (and maybe, hopefully
eventually, develop a "good" open-source implementation).

however, it is sad in one way:
Mono is the main O/S implementation, but the people behind Mono can hardly
see the world beyond Mono. they see their implementation, but fail to really
take note of what it does (and could) represent (ok, if their code were a
little cleaner, if it were something other than plain GPL, ...).

but, back to the prior point, even with this, the world does not need Mono
per-se, as there can be in turn more implementations (addressing goals or
adapted to different needs), ...

much the same as with the JVM and x86.

also some more standardized AST formats would be nice.
pseudo-Scheme and XML work ok I guess though, so these are what I am
using... (Scheme-based AST's are used for my high-level scripting languages,
and XML is used in my main C compiler, and likely for Java and C# frontends
if they ever get written...).

the main advantage of XML is that it tends to be a little more flexible than
S-Expressions, but is also a little more awkward and not as efficient (the
particular implementation I am using is custom-written for this purpose and
involves a few special optimizations so as not to kill performance...).

sadly, I am not aware of any particular standards WRT doing AST's in XML...

well, and my signature-strings and metadata-database formats are hardly
standardized (although, the former was derived from both the IA-64 ABI
name-mangling and the JVM's signature strings, and the metadata database
(or, at least, its external representation) was derived fairly closely from
the ".ini" and ".reg" formats from Windows...).

I guess I am uncreative WRT design, as I usually feel more comfortable using
something common and familiar as a "design template" (or at least, in cases
where I can't use it directly for some reason).

but, yes, this points out another thing:
XML is itself a standardized representation;
it would be really sad (and silly) if a person were to equate XML with a
particular implementation thereof (although, I guess there are so many
implementations that it would be difficult to fall into this trap).

then again, I have seen that a lot of people seem to mentally equate deflate
with zlib, or JPEG loading/saving with libjpeg, even though neither is
really needed (I have support for both formats in my case, but don't use
either library).

similarly goes, earlier today I ended up having to write a few lines of code
to make my GC pretend it was Boehm's (and, within the past few days had
considered the vague possibility of a similar shim to make Boehm's GC
pretend it was mine, although this would be technically slightly less
trivial, but probably still < 1 kloc...).

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.