Scanner generator re2c: DFA construction algorithm?

stefaan_willems_re2c@yahoo.com (Stefaan Willems)
22 Jun 2003 23:19:50 -0400

          From comp.compilers

Related articles
Scanner generator re2c: DFA construction algorithm? stefaan_willems_re2c@yahoo.com (2003-06-22)
| List of all articles for this month |

From: stefaan_willems_re2c@yahoo.com (Stefaan Willems)
Newsgroups: comp.compilers
Date: 22 Jun 2003 23:19:50 -0400
Organization: http://groups.google.com/
Keywords: lex, question
Posted-Date: 22 Jun 2003 23:19:50 EDT

Can anybody give me a reference to the theory behind the construction
of the deterministic finite state automation (DFA) as implemented in
re2c (v0.9.1 source distribution available from
http://www.tildeorg.slash/re2c).


I am in particularly puzzled by the usage of the data structures
CharSet cs and Ins ins[], which seem to be set up as intermediates in
the DFA construction, as you can see below:


[typedef union Ins, file ins.h]


union Ins {
        struct {
                byte tag;
                byte marked;
                void *link;
        } i;
        struct {
                ushort value;
                ushort bump;
                void *link;
        } c;
};


[func genCode, file actions.cc]


void genCode(ostream& o, RegExp *re){
        CharSet cs;
        uint j;


        memset(&cs, 0, sizeof(cs));
        for (j = 0; j < nChars; ++j) {
                cs.rep[j] = &cs.ptn[0];
                cs.ptn[j].nxt = &cs.ptn[j+1];
        }
        cs.freeHead = &cs.ptn[1];
        *(cs.freeTail = &cs.ptn[nChars-1].nxt) = NULL;
        cs.ptn[0].card = nChars;
        cs.ptn[0].nxt = NULL;


        re->split(cs);


        Char rep[nChars];
        for(j = 0; j < nChars; ++j){
        if (!cs.rep[j]->nxt)
                cs.rep[j]->nxt = &cs.ptn[j];
                rep[j] = (Char) (cs.rep[j]->nxt - &cs.ptn[0]);
        }


        re->calcSize(rep);


        Ins *ins = new Ins[re->size+1];
        memset(ins, 0, (re->size+1)*sizeof(Ins));
        re->compile(rep, ins);
        Ins *eoi = &ins[re->size];
        eoi->i.tag = GOTO;
        eoi->i.link = eoi;


        optimize(ins);
        for (j = 0; j < re->size;) {
                unmark(&ins[j]);
                if (ins[j].i.tag == CHAR){
                        j = (Ins*) ins[j].i.link - ins;
                } else {
                        j++;
                }
        }


        DFA *dfa = new DFA(ins, re->size, 0, 256, rep);
        dfa->emit(o);


        delete dfa;
        delete [] ins;
}


I have tried to contact the authors at the e-mail addresses supplied
on the web page, but these do either no longer exist or are no longer
used. If anybody knows how to get in touch with them, please let me
know.


I appreciate any help. Thanks,


Stefaan


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.