Re: Parallel lexers by chunking the input

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Sat, 26 Mar 2022 09:27:05 GMT

From comp.compilers

Related articles
Parallel lexers by chunking the input christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-24)
Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (2022-03-25)
Re: Parallel lexers by chunking the input gah4@u.washington.edu (gah4) (2022-03-25)
Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (2022-03-25)
Parallel lexers by chunking the input christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-26)
*Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at* (2022-03-26)**

| List of all articles for this month |

From:	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups:	comp.compilers
Date:	Sat, 26 Mar 2022 09:27:05 GMT
Organization:	Institut fuer Computersprachen, Technische Universitaet Wien
References:	22-03-058 22-03-064 22-03-065 22-03-067
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="54981"; mail-complaints-to="abuse@iecc.com"
Keywords:	lex, parallel
Posted-Date:	26 Mar 2022 10:16:26 EDT

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>[I don't understand where you get only two passes. If you divide the input
>into arbitrary sized chunks, you could potentially start at any state in the
>scanner and while most of those states would quickly fail, that's a lot of
>startup overhead for each block. -John]

The first pass processes a DFA. No startup overhead.

Here's a simple example. For simplicity, consider an input alphabet
consisting only of [a1"]; consider the following lex specification:

"[a1]*" return T_STRING;
a+ return T_ID;
1+ return T_NUM;

This results in the following transition table (the original state
machine); we start in state B:

current input char
state " a 1
B C D E
C B C C
D C D E
E C D E

On some of these transitions, you get the scanner actions, but that's
not important for the next step:

We now want to produce a continuation automaton that can start in any
of these states and track what the resulting state of the original
automaton would be if we started with a specific one of the original
states; i.e., a state EBCD means that in the original domain we would
map B (first state) to E, C to B, D to C, and E to D; the start state
of this automaton is BCDE (i.e., the identity mapping):

current input char
state " a 1
BCDE CBCC DCDD ECEE
CBCC BCBB CDCC CECC
DCDD CBCC DCDD ECEE
ECEE CBCC DCDD ECEE
BCBB CBCC DCDD ECEE
CDCC BCBB CDCC CECC
CECC BCBB CDCC CECC

And we have the following mapping from start-original x
end-continuation -> end-original states:

B C D E
BCDE B C D E
CBCC C B C C
DCDD D C D D
ECEE E C E E
BCBB B C B B
CDCC C D C C
CECC C E C C

In the first pass we use the continuation automaton. For simplicity,
let's just work with three chunks. For the first chunk, we know the
original start state, so we actually don't need to use the
continuation automaton, but we need it for the other chunks. For the
last chunk, we actually don't need the end state of the continuation
automaton (because it's only needed for determining the start
original-state of the next chunk), so we only need to process the
continuation automaton for chunks 2...n-1. We start these chunks with
the start state BCDE. So we process the first chunk with the original
automaton (with scanner actions) on core 1 while we process the second
chunk with the continuation automaton on core 2. In the end we get,
say, the end state C for chunk 1, and the end state CECC for chunk 2.

We can now look up the original end state of chunk 2 from these two
end states: CxCECC->E. In general, we can continue this up to chunk
n-1. This is the sequential step.

Now we can process (in parallel) chunk 2 and chunk 3 with the original
automaton (including scanner actions): chunk 2 starts in state C,
while chunk 3 starts in state E.

If you look at the continuation automaton, there is no startup
overhead, no NFA overhead, no need to resync. You do have the CPU
overhead of running the continuation automaton and the sequential
component in addition to the original automaton; that's the price you
pay for parallelism.

I would be surprised if nobody has come up with this before. But if
nobody has, maybe I should write up a paper about it.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Parallel lexers by chunking the input

anton@mips.complang.tuwien.ac.at (Anton Ertl)Sat, 26 Mar 2022 09:27:05 GMT

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Sat, 26 Mar 2022 09:27:05 GMT