Oilexer Early Release

Alexander Morou <alexander.morou@gmail.com>
Sun, 12 Jul 2015 14:31:45 -0500

          From comp.compilers

Related articles
Oilexer Early Release alexander.morou@gmail.com (Alexander Morou) (2015-07-12)
| List of all articles for this month |

From: Alexander Morou <alexander.morou@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 12 Jul 2015 14:31:45 -0500
Organization: Compilers Central
Keywords: lex, available
Posted-Date: 13 Jul 2015 23:43:37 EDT

I've posted a very early release of Oilexer on Codeplex:
https://oilexer.codeplex.com/releases/view/616236


This release is very early, so there is no error recovery details present,
it is capable of detecting failure points, but I just haven't made up
my mind on the specific error recovery strategy I'm going to use.


It exports to C# language in the form of multiple .cs files, requires
no library dependencies. So if OILexer completes its processing on
a grammar, and you instructed it to export C# files, it should just
compile by: creating a new C# Project in Visual Studio, dragging the files
*onto a node* of the solution explorer for that project and building should
be all you need to do (and adding a little code to specify a file to parse)


The sample OILexer grammar would be built thusly, from a command prompt
in the folder you extract it to:
OILexer.exe "Samples\Oilexer\Oilexer.oilexer" -ex:cs


It provides two things once you call a specific ParseRULENAMEHERE method:
1. The AST node of the parse method you called.
      a. The AST serves to provide you access to the items you captured in
            the grammar. If you don't specify any captures, all it points to
            is the context.
2. The AST node always points to the context, or the Concrete set of
      symbols represented by that parse. This is the fluff and other stuff you
      need to make it less ambiguous.


The approach is LL(*) with support for Direct and Indirect left recursion
through the use of a symbol stream (vs a standard token stream only.)


There are a few known issues:
1. Follow ambiguities which consume required calling rule tokens within
      a reduction of a prediction have a chance to guess wrong and consume
      too greedily, this will cause a false positive parse failure on valid
      sentences of a grammar. This will be tackled after Error Recovery.


2. Certain heavily intertwined left recursive sets of rules might exit
      prematurely because the stack sniffing I currently use is overly
      cautious, causing it to bail. This is the focus after #1.


3. The #Root and other preprocessor constants observed in the samples
      appear to be required to a degree as they can potentially yield
      bad paths on the output, I suspect this is an easy fix, simple
      solution for now is to start from a sample.


4. Heavily left-recursive rule sets that go 20+ levels in their definition
      can yield poor parse time for heavily nested sentences.


5. A lot of things are likely buggy and incomplete, I welcome any and all
      feedback.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.