Re: Recursive descent parsing and optimization, was Good practical language and OS agnostic text?

"BartC" <bc@freeuk.com>
Sun, 22 Apr 2012 12:51:44 +0100

          From comp.compilers

Related articles
Good practical language and OS agnostic text? compilers@is-not-my.name (2012-04-17)
Re: Good practical language and OS agnostic text? ulimakesacompiler@googlemail.com (Uli Kusterer) (2012-04-21)
Re: Good practical language and OS agnostic text? cr88192@hotmail.com (BGB) (2012-04-21)
Re: Recursive descent parsing and optimization, was Good practical lan bc@freeuk.com (BartC) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan cr88192@hotmail.com (BGB) (2012-04-22)
Re: Recursive descent parsing and optimization, was Good practical lan bartc@freeuk.com (Bartc) (2012-04-23)
| List of all articles for this month |

From: "BartC" <bc@freeuk.com>
Newsgroups: comp.compilers
Date: Sun, 22 Apr 2012 12:51:44 +0100
Organization: A noiseless patient Spider
References: 12-04-019 12-04-056 12-04-060
Keywords: parse, design
Posted-Date: 22 Apr 2012 10:29:33 EDT

"BGB" <cr88192@hotmail.com> wrote in message
> On 4/21/2012 2:22 AM, Uli Kusterer wrote:


>> - Recursive descent parsers: It's the obvious way to write a parser.


> although I use recursive descent, the above sounds different from what I
> usually do.


> ReadStatement:
> checks for and handles vaious statement types
> calls ReadExpression
>
> ReadExpression:
> (actually, this is a tower of functions, one for each precedence
> level, working from lowest to highest precedence)


Sounds like it's influenced by the C grammar, which defines expressions
using something like 13 or 17 layers of precedence.


Beyond about 3-4 levels, I found that unmanageable. For expression syntax, I
don't use any precedence in the grammar at all; I have precedence as an
attribute of an operator, and an expression can be parsed with a single
function.


Or rather two: readfactor(priority), and readterm(). Readfactor() deals with
the binary operators linking successive terms, while readterm() does all
the real work (since my syntax doesn't distinguish between expressions and
statements, that's quite a big workload).


>> - Tokenizing: Essentially grab all the words in your source text and
>> build an array with an entry for each so you can more quickly walk
>> forward and backward without having to scan individual characters.


> partly agreed, except that it isn't really strictly necessary to build
> an array up-front.


> most of my parsers don't bother using an array, but instead just call
> the tokenizer function directly to peek and parse tokens based on the
> current "stream position" (generally a raw char pointer in my language
> parsers).


I've tried reading all the tokens in a separate pass, but didn't really like
it. And it takes a lot more storage as well, especially with macro
expansions.


Instead I read them as I go along, but with provision for a one-symbol
look-ahead.


>> - Syntax tree: Build a tree structure that represents the parsed program.
>> You


> typically, things like simplifying expressions, propagating constants,
> ... is done in a stage I have typically called "reduction", which
> happens between parsing and prior to producing (bytecode) output.


I use the following passes (which seem to be fairly typical):


Syntax analysis (lexing and parsing)
Name resolution (a recent introduction for me)
Type analysis (static type checks and coercions, constant folding)
Code generation (to intermediate code or to byte-code)
Final pass (from intermediate code to the target code)


Usually invoked one after the other for the entire module, where a
compile-time expressions is needed, then the first three passes have to be
done immediately (and the result had better be a constant value..)


I use the same structure now when generating byte-code (originally such a
compiler was just single-pass). Because such code is usually dynamically
typed, the type analysis pass only needs a nominal amount of work, but still
takes care of a few things (l-values for example).


> some past experiments (special purpose tests), have implied that
> potentially threaded code can pull off "reasonably solid" interpreter
> performance (potentially within 3-5x of native).


Assuming you're talking about dynamic typing, I found it difficult to get
within 3-5x, unless some sort of type hinting is used, or perhaps you're
comparing with not too highly optimised native code. Or it's code that is
memory-intensive, then memory access will dominate.


--
Bartc


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.