Fri, 28 Jan 1994 01:42:27 GMT

Related articles |
---|

Parallel Parsing of Parenthesis :: email summary rockwell@nova.umd.edu (1994-01-28) |

Re: Parallel Parsing of Parenthesis :: email summary Jonathan Hardwick (1994-02-02) |

Newsgroups: | comp.compilers,comp.parallel |

From: | rockwell@nova.umd.edu (Raul Deluth Miller) |

Keywords: | parallel, parse, summary |

Organization: | University of Maryland University College |

Date: | Fri, 28 Jan 1994 01:42:27 GMT |

I think it's time I posted the email responses I received to my parallel

parenthesis matching query. I don't have explicit permission to repost,

so I've separated from lines from the rest of the work [from lines

slightly editted, in cases where I noticed a correction in the signature

line.]

*>From lines are in ascii collating order, text is in order I received it.*

I've left in all references to the parallel prefix solution -- they don't

take up much space.

Raul D. Miller

<rockwell@nova.umd.edu>

From: "Dean P. McCullough" <dpmccul@zombie.ncsc.mil>

From: A.D.Ben-Dyke@computer-science.birmingham.ac.uk

From: Bryan Carpenter <dbc@ecs.soton.ac.uk>

From: David Neto <neto@cs.toronto.edu>

From: Guy Blelloch <guyb@BLELLOCH.PC.CS.CMU.EDU>

From: Jonas Barklund <jonas@csd.uu.se>

From: Jonathan Hill <hilly@dcs.qmw.ac.uk>

From: Suresh Kalathur <suresh@cs.brandeis.edu>

From: Theo.Norvell@comlab.ox.ac.uk

From: j.p.m.devreught@cp.tn.tudelft.nl (Hans de Vreught)

From: jamf@lix.polytechnique.fr (Juarez Assumpcao Muylaert Filho)

From: jch@GS69.SP.CS.CMU.EDU

From: josh@cs.rutgers.edu (J Storrs Hall)

From: mic@dcs.ed.ac.uk

From: mnelson@cis.udel.edu

From: qiu@jupiter.drev.dnd.ca (Ke Qiu)

From: seyfarth@whale.st.usm.edu (Ray Seyfarth)

From: sscott@mcs.kent.edu

From: tmb@idiap.ch (Thomas M. Breuel)

From: trefftz@cps.msu.edu (Christian Trefftz)

......................................................................

An idea...:

If you were to assignt the values:

1 to (

-1 to )

If you were to add them up, then if the parentheses are balanced, then

the total sum is 0 and the partial sums are always positive.

You can use a parallel prefix operation for this.

......................................................................

This problem was covered in a grad-level course taught by Guy Blelloch

here at CMU a couple of years ago, and I believe again in a course

taught at DAGS last year. I attach the appropriate tex(t) from the

course notes to the end of this email -- LaTeX won't accept it without

our weird and wonderful macros, but you can get the gist of it just by

reading the text. The full text of the report is available either via

WWW (http://www.cs.cmu.edu:8001/afs/cs.cmu.edu/project/scandal/public/mosaic/mosaic.html

gets to our project home page), or ftp to debubly.scandal.cs.cmu.edu,

cd /afs/cs.cmu.edu/scandal/public/papers, and get the file

CMU-CS-93-115.ps.Z).

-------------------- cut here --------------------

\section{Parsing}

\subsection{Example: parsing a context free language}

This example outlines an approach one might take to do parallel parsing

of expressions in a simple parenthesized LISP-like language. An

expression to be parsed looks like:

\begin{quote}

\begin{verbatim}

(defun (foo a) (+ a 7))

\end{verbatim}

\end{quote}

The first step in parsing such an expression involves breaking the

expression up into tokens (i.e., {\em tokenizing}). In a LISP-like

language, this can be accomplished by doing a one-character lookahead at

each character position. End-of-token flags are set at character

positions preceding a space, at spaces, and at parentheses.

There are a couple of extensions to the basic algorithm:

\begin{itemize}

\item Integers are harder to recognize because the end of an integer

is more dependent on context. However, they can still be

handled with a limited lookahead at the appropriate character

positions.

\item Comments are of the form ``{\tt ;...$\backslash$n}''. We can

handle these by creating a vector of flags in which {\tt 1}'s

are placed at locations containing a semicolon. Comments can

then be flagged by doing a segmented {\tt or\_scan} over this

vector, with the scan being reset at character positions

containing a {\tt $\backslash$n}. Of course, this approach

would also capture comments that are enclosed within text

strings, so all strings must be flagged before to applying this

procedure.

\end{itemize}

Once the expression has been tokenized, its parentheses need to be

matched. {\em Parenthesis matching} cannot be done using a scan alone.

This is basically due to the same reason that parenthesis matching

cannot be done using a regular expression---a scan operation, like a

regular expression, keeps a constant amount of state, and parentheses

cannot be matched using only a constant amount of state. A context-free

language, or something with a ``stack'', is required.

However, a scan can serve as the first step in parenthesis matching. A

{\tt plus\_scan} over a vector containing a 1 for every ``{\tt (}'' and a

$-1$ for every ``{\tt )}'' will number every opening and closing

parenthesis with the appropriate nesting depth number. The parentheses

are balanced if an equal number of {\tt (}'s and {\tt )}'s occur at each

nest depth.

If all of the parentheses at a given depth are grouped together, then a

string of the form ``{\tt ()()()()()()()...}'' results, i.e., matching

becomes a trivial matter of looking at the neighboring parenthesis.

Once the parentheses have been checked for balance, they can be grouped

together simply by sorting them according to their depth numbers. A

radix sort is particularly appropriate for this task as the maximum

depth number is known in advance, and is less than the size of the input

string. Unfortunately, however, a radix sort won't complete in $\log n$

steps. If this becomes a problem, a completely different approach is

required, such as divide and conquer (the input is divided and

parentheses are matched in each partition; unmatched parentheses must be

dealt with in the merge process). The algorithm is rather tricky to

code, but it has the virtue of being work efficient if implemented

properly.

\subsection{Parsing regular languages}

The standard way of parsing a regular language is to convert the regular

expression describing it into a finite state automaton (FSA), and to

simulate the execution of this FSA when given the string to be parsed as

input. Thus, the task of parsing a regular language in parallel is the

same as that of simulating the execution of a FSA in parallel.

Recall that a FSA consists of an input tape, a head that reads the input

tape, a finite set of states, and a transition table describing, for

each state, what states may be reached from it if an appropriate symbol

is read from the input tape. One way to represent the transition table

is to construct a set of $m \times m$ boolean matrices $T_{ij}$, such

that there is one boolean matrix for each letter of the input alphabet

($m$ is the number of states in the automaton). For each input symbol

$\sigma$, the boolean matrix $T^{\sigma}$ is defined as:

\[

T^{\sigma}_{ij} = \left\{ \begin{array}{ll}

1 & \mbox{if state $j$ may result if $\sigma$ is

read in state $i$} \\

0 & \mbox{otherwise.}

\end{array}

\right.

\]

Observe that if the automaton being described happens to be a

deterministic FSA, each of the matrices will simply be a permutation of

the states, and could more easily be represented by a list of pairs.

The behavior of the FSA for a certain input string can be computed in

parallel by doing a {\tt matrix\_multiply\_reduce} over the string of

boolean matrices obtained by replacing each character in the input with

its corresponding boolean transition matrix.\footnote{If all the

intermediate states of the FSA need to be known, a {\tt

matrix\_multiply\_scan} can be used instead.} The result of this reduce

operation is a matrix specifying what state the FSA will terminate in,

given the input string in question and a particular initial state. The

terminating state corresponding to the appropriate initial state can

then be looked up in the table. Observe that the problem can be solved

in this manner only because matrix multiplication is {\em associative},

thereby allowing the reduction to be computed in a tree-like fashion.

Unfortunately, each of the boolean matrices has $m^2$ entries, where $m$

is the number of states in the FSA. This means that $n$ matrix

multiplications must be performed, each with $m^3$ cost, yielding a work

complexity of $O(nm^{3})$. If the FSA in question is deterministic and

permutations are used to represent the transition tables, the work

complexity becomes $O(nm)$---recall that permutations can be composed in

linear time, and that composition of permutations is associative.

However, the analogous serial algorithm does $O(n)$ work, and thus

neither of the parallel algorithms is work efficient, although in cases

where $m$ is small they remain practical. Some work has been done to

show that $O(nm)$ is the best work complexity with which this problem

can be solved in parallel. Note, however, that the parallel algorithm

provides more information than is obtained from the serial

algorithm---it provides the resultant state for {\em all} start states.

If a serial algorithm were to compute this rather than provide the

result for just one start state, it would have to do the same amount of

work as the parallel algorithm.

In conclusion, some parsing problems can be solved efficiently in

parallel (such as the LISP-like parsing in the earlier example);

solutions to other parsing problems may not be completely work efficient

but are nevertheless often practical.

\section{For More Information}

Selection is discussed further in J\'aj\'a~\cite[pages 181--184]{jaja}.

......................................................................

I'm sorry that I don't have an exact reference for you, but here goes:

I recall reading a paper about a year ago about doing such a thing on

a PRAM (I can't remember which variety). I think the author was

Prabakhar Ragde, and the paper was in some conference proceedings in

the last three years. Check the STOC and FOCS proceedings. I believe

he proved an inverse Ackerman lower bound that matched an

already-known inverse Ackerman upper bound.

......................................................................

Depending on exactly what your question is, this problem can be

handled very straightforwardly using parallel prefix. E.g. the

problem "given a particular parenthesis, where is the matching

parenthesis" could be done by treating '(' as +1 and ')' as -1, and

computing a prefix sum for all points between parenthesis. Finding a

match to a '(' would turn into "find the first point to the right of

the point preceding the '(' of interest that has the same sum and

return the ')' to its left", which requires time O(log n) work O(n)

for the preprocessing and each query. Communication would be purely

binary tree structured and require passing one number each way on each

link for each query.

This is just off the top of my head; let me know if you get a better

method. My area of research is parallelization of compiler back ends,

not front ends.

......................................................................

I apparently do not understand the complexity of this problem.

For a sub segment of the input string, I wish to generate two numbers,

the first is the number of close parens that must be matched with open

parens to the left of the current sub segment, and the second is the

number of open parens that must be matched with close parens to the

right of the current sub segment.

The string can be divided between the processors, and each processor

can independently compute the two numbers for its sub segment.

Adjacent sub segments can be combined.

Let l_i, r_i be the two numbers for sub segment i. (Assuming that i

is even and) combining sub segments i and i+1 yield:

If r_i <= l_(i+1)

l'_(i/2) = l_i + l_(i+1) - r_i

r'_(i/2) = r_(i+1)

Else if r_i > l_(i+1)

l'_(i/2) = l_i

r'_(i/2) = r_i + r_(i+1) - l_(i+1)

Thus in a tree reduction, I can compute an overall pair of numbers for

the total string in log(n) time such that the string is balanced iff

the pair of numbers are both 0.

......................................................................

Look into

Berkman, 0., et al. "Some doubly logarithmic optimal parallel

algorithms based on finding all nearest smaller values", TR

UMIACS-TR-88-79, Univ. of Md, Inst. for Advanced Computer

Studies, College Park, MD.

I found a reference to that in JaJa's parallel algorithms text.

......................................................................

The problem can trivially be solved in logarithmic time in a parallel

prefix model of computation: replace every "(" by +1, every ")" by -1,

and every other symbol by "0". Then, compute a +-scan operation. If

the highest number processor contains a "0" and no processor contains

a negative number, the parentheses are balanced. Because of this, you

can also solve the same problem in logarithmic time in a variety of

other parallel models of computation.

......................................................................

Try to contact Prof. Jacques Cohen (jc@cs.brandeis.edu).

......................................................................

Have you looked at:

* section 3.1

* section 4.5

of

A. Gibbons, W. Rytter, Efficient Parallel Algorithms, Cambridge

University Press, 1988.

Both section are closely related to your problem and both lead to a

O(log n) time algorithm using O(n / log n) processors on a PRAM.

......................................................................

I have a tech report which describes a data-parallel algorithm for

this problem - its actually written in a functional language, but the

mapping to an explicit parallel algorithm is obvious (and discussed).

Here is the uuencoded, compressed postscript of the report, which has

now been published as CSR-29-93 of the Computer Science Department,

University of Edinburgh.

I hope this is of interest, and I'd be pleased to hear of anything

else you're told about which might be relevant to me!

[uuencoded version postscript report elided]

......................................................................

Sorry if this is a little vague, as it's not my subject area,

but I think some of the work undertaken on the FFP project covered

bracket matching. A good intro to the subject if the following:

@InProceedings{Partain:Scan,

author = {William D. Partain},

title = {Normal-Order Reduction using Scan Primitives},

institution = {Glasgow University},

booktitle = {Functional Programming},

year = {1990},

pages = {218--226},

annote = {Discusses how tree-based parallel reduction can

be a potentially useful basis for FP. Describes

how a scan based algorithm for finding bound and

free variables within lambda expressions can be

very cheap using special purpose hardware (FFP

Machine - a linear array of small cells, each

containing one symbol). Briefly describes some

of the approaches taken to improve efficiency of

tree reduction, mentioning the problem of pointer

following. The parallelism is gained on single

node reduction - no attempt is made to reduce more

than one node at once. },

got = {In Library, Classmark QA 76.6G},

*}*

Some other references are

@Article{Mago79a,

author = {Gyula A. Mag\'{o}},

title = {A Network of MicroProcessors to Execute Reduction

Languages},

journal = {Computer and Information Sciences},

year = {1979},

month = {October},

part = {I},

volume = {8},

number = {5},

pages = {349--385},

*}*

@Article{Mago79b,

author = {Gyula A. Mag\'{o}},

title = {A Network of MicroProcessors to Execute Reduction

Languages},

journal = {Computer and Information Sciences},

year = {1979},

month = {December},

part = {II},

volume = {8},

number = {6},

pages = {435--471},

*}*

@InProceedings{Mago89,

author = {Gyula A. Mag\'o and Donald F. Stanat},

title = {The {FFP} Machine},

booktitle = {High-Level Language Computer Architectures},

year = {1989},

pages = {430--468},

*}*

......................................................................

This is easily accomplished using scan (parallel prefix). We do it in

the Rutgers CAM, see also Guy Blelloch's work.

......................................................................

Here's the Nesl code for parenthesis matching:

function paren_match(foo) =

let

left_flag = {foo == `(: foo};

val = {select(left_flag, 1, -1): left_flag};

sc = {btoi(l) + b: l in left_flag; b in plus_scan(val)};

rnk = permute(index(#sc), rank(sc));

ret = interleave(odd_elts(rnk), even_elts(rnk))

in permute(ret, rnk) $

It is in the example directory. It takes a string of parentheses and

returns a pointer to the matching parenthesis.

......................................................................

Skillicorn and Barnard of Queen's University in Kingston did some work

on CFG (and subsets like LL(1), LR) parsing on PRAMS. I'm sure they

have at least a tech report on it. Not quite what you were looking

for, I know.

......................................................................

Section 4.4 of ``Efficient Parallel Algorithms'' by Gibbons and

Rytter is all about this problem, in case you aren't aware of

that work. I read it once, but I've forgotten the details...

They're only interested in the basic parallel algorithm, I think,

not communication costs.

......................................................................

In the tech. report ``List Homomorphic parallel algorithms for

bracket matching'' [1], Murray Cole develops a parallel algorithm that

solves the problem of matching brackets such as ({[]}) [its an

interesting paper, well worth getting :-]

When I read the report, it occured to me that the problem could be

solved by a parallel LL(1) parser [2,3,4]. The solution Murray gave to

the bracket problem looks very similar to the technique I used in the

LL(1) parser---but maybe thats because we are both using a higher

order functional language, and they both decompose into the scanning

of a similar function across the input to be matched..

[1] @techreport{cole:bracket,

author = {Cole, Murray},

institution = {Dept. of Computer Science, University of Edinburgh},

number = {CSR-29-93},

title = {List homomorphic parallel algorithms for bracket matching.},

year = {1993}

*}*

[2] @article{barnard:parse,

author = {Barnard, D. B. and Skillicorn, D. T.},

journal = {Information Processing letters},

month = may,

title = {Parallel parsing on the {C}onnection {M}achine},

year = {1989}

*}*

[3] @article{hill:parse,

author = {Hill, Jonathan. M. D.},

journal = {Parallel Computing},

month = jul,

number = {6},

pages = {699-714},

title = {Parallel lexical analysis and parsing on the {AMT} {D}istributed

{A}rray {P}rocessor},

volume = {18},

year = {1992}

*}*

[4] @techreport{hill:dpGlue,

author = {Hill, Jonathan. M. D.},

institution = {{QMW CS}},

month = dec,

note = {Available by {FTP} from ftp.dcs.qmw.ac.uk

in {/pub/cpc/jon\_hill/dpGlue.ps}},

number = {611},

title = {{D}ata {P}arallel {H}askell: {M}ixing old and new glue},

year = {1992}

*}*

......................................................................

There was some work here at KSU regarding the writing of a compiler on

a massively parallel machine - one dissertation. They published some

papers in IPPS 92 and 93 I believe. The authors names would be

Chandra Rajappa (sp?) of either KSU or Cleveland State and/or Jerry

Potter of KSU. Also some of the work on structure codes by Jerry

Potter may answer some of your questions - in the same publications.

......................................................................

My knowledge about this is very close to zero, where zero means no

information! :-) However, last week I attended a talk by Jacques Cohen

from Brandeis University. The title was "Generating Massively

Parallel Compilers". Cohen uses parenthesis matching to do this in a

very cute way for the connection machine. I think you should try

contacting him. Unfortunately, I don't have his e-mail address, but I

suppose you can find it by using netfind.

......................................................................

Just saw the post by torbenm@diku.dk (Torben AEgidius Mogensen) in

comp.theory in which you asked info. on parallel balanced parenthesis

matching. I know some works done in this area.

1. The problem can be solved by formulating it into a general prefix

computation problem and then by solving the GPC problem on a host

of parallel machines (hypercube, ...) see the following reference:

Parallel General Prefix Computations with Geometric, Algebraic, and

Other Applications,

F. Springsteel and I. Stojmenovic,

International Journal of Parallel Programming, Vol 18, No. 6, 1989,

pp. 485-503.

2. I think Dr. Sajal Das did some work in this area (I was in his seminar

where he gave a talk on this subject). His work was published in

Information Processing Letters, probably in 1991. He is with

Univ. of North texas (try das@ponder.csci.unt.edu).

......................................................................

I gave a problem like yours as an (advanced) exercise on our parallel

programming course. The problem was to be solved in *LISP, to be run

on a Connection Machine. I can give you my solution (assuming I can

still find it) but we made no effort to optimize or even study the

resulting communication patterns. Just as Torben suggested, my

solution was based on a parallel scan.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.