|Good practical language and OS agnostic text? email@example.com (2012-04-17)|
|Re: Good practical language and OS agnostic text? firstname.lastname@example.org (Uli Kusterer) (2012-04-21)|
|Re: Good practical language and OS agnostic text? email@example.com (BGB) (2012-04-21)|
|Re: Recursive descent parsing and optimization, was Good practical lan firstname.lastname@example.org (BartC) (2012-04-22)|
|Re: Recursive descent parsing and optimization, was Good practical lan email@example.com (Dmitry A. Kazakov) (2012-04-22)|
|Re: Recursive descent parsing and optimization, was Good practical lan firstname.lastname@example.org (BGB) (2012-04-22)|
|Re: Recursive descent parsing and optimization, was Good practical lan email@example.com (Bartc) (2012-04-23)|
|Date:||Sun, 22 Apr 2012 12:51:44 +0100|
|Organization:||A noiseless patient Spider|
|References:||12-04-019 12-04-056 12-04-060|
|Posted-Date:||22 Apr 2012 10:29:33 EDT|
"BGB" <firstname.lastname@example.org> wrote in message
> On 4/21/2012 2:22 AM, Uli Kusterer wrote:
>> - Recursive descent parsers: It's the obvious way to write a parser.
> although I use recursive descent, the above sounds different from what I
> usually do.
> checks for and handles vaious statement types
> calls ReadExpression
> (actually, this is a tower of functions, one for each precedence
> level, working from lowest to highest precedence)
Sounds like it's influenced by the C grammar, which defines expressions
using something like 13 or 17 layers of precedence.
Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
don't use any precedence in the grammar at all; I have precedence as an
attribute of an operator, and an expression can be parsed with a single
Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
the binary operators linking successive terms, while readterm() does all
the real work (since my syntax doesn't distinguish between expressions and
statements, that's quite a big workload).
>> - Tokenizing: Essentially grab all the words in your source text and
>> build an array with an entry for each so you can more quickly walk
>> forward and backward without having to scan individual characters.
> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.
> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
I've tried reading all the tokens in a separate pass, but didn't really like
it. And it takes a lot more storage as well, especially with macro
Instead I read them as I go along, but with provision for a one-symbol
>> - Syntax tree: Build a tree structure that represents the parsed program.
> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.
I use the following passes (which seem to be fairly typical):
Syntax analysis (lexing and parsing)
Name resolution (a recent introduction for me)
Type analysis (static type checks and coercions, constant folding)
Code generation (to intermediate code or to byte-code)
Final pass (from intermediate code to the target code)
Usually invoked one after the other for the entire module, where a
compile-time expressions is needed, then the first three passes have to be
done immediately (and the result had better be a constant value..)
I use the same structure now when generating byte-code (originally such a
compiler was just single-pass). Because such code is usually dynamically
typed, the type analysis pass only needs a nominal amount of work, but still
takes care of a few things (l-values for example).
> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).
Assuming you're talking about dynamic typing, I found it difficult to get
within 3-5x, unless some sort of type hinting is used, or perhaps you're
comparing with not too highly optimised native code. Or it's code that is
memory-intensive, then memory access will dominate.
Return to the
Search the comp.compilers archives again.