# Re: computing the first set (concretely)

## SM Ryan <wyrmwif@tsoft.org>12 Mar 2006 13:55:57 -0500

From comp.compilers

Related articles
computing the first set (concretely) aegis@mad.scientist.com (aegis) (2006-03-11)
Re: computing the first set (concretely) DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-03-12)
Re: computing the first set (concretely) wyrmwif@tsoft.org (SM Ryan) (2006-03-12)
| List of all articles for this month |

 From: SM Ryan Newsgroups: comp.compilers Date: 12 Mar 2006 13:55:57 -0500 Organization: Quick STOP Groceries References: 06-03-031 Keywords: parse Posted-Date: 12 Mar 2006 13:55:57 EST

# I know how to compute the first set however I am conflicted on how I
# should represent the syntactical specification of my language in order
# to compute the first set programmatically.
#
# I was thinking along the lines of a two dimensional matrix by where
# the rows are indexed by a particular nonterminal to find all terminals
# for the given production.
#
# It would look something like:
#
# int table[X][Y] = { { ... } ,
# { expr, '+', digit },
# { 0, 1, 2, 3, 4, 5 },
# { ... }
# };
#
# and where my specification would look like:
#
# ...
# expr: expr '+' digit ;
# digit: 0 | 1 | 2 | 3 | 4 | 5 ;

E: S | E + S
S: SIGN P
SIGN: + | - |
P: a | ( E )

(1) Create a directed graph p => q if production p can produce production q.
E => S, E => P, E => E, E => SIGN
S => S, S => P, S => E, S => SIGN
P => S, P => P, P => E, P => SIGN
SIGN
(2) Find strongly connected components and reduce the above graph to a DAG
{E,S,P} => {SIGN}
{SIGN}
(3) You can now easily compute NULLABLE(X) = X can produce an empty string
Start from the bottom of the reduced DAG and examine all alternatives
of all the productions in that SCC. Ignore alternatives that include
a production in the same SCC; if any other alternative is empty or
consists solely of nullable production, all productions in the current
SCC are nullable.
NULLABLE({SIGN}) = NULLABLE(+) | NULLABLE(-) | NULLABLE()
= true
NULLABLE({E,S,P}) = NULLABLE(a) -- the only nonrecursive alternative
= false
(4) Compute WEIGHT(X) = length of minimum nonempty terminal expansion of X,
or 0 if X is always empty.
WEIGHT(SIGN) = min(WEIGHT(+),WEIGHT(-)) = 1
Again use the SCCs of the => graph to identify recursive alternatives.
WEIGHT(P) = min(WEIGHT(a),WEIGHT((E)))
The (E) alternative is recursive because E and P
are in the same SCC of the => graph. And because
not NULLABLE(P), we know the recursive alternative
cannot be any lighter than the nonrecursive alternative.
= min(WEIGHT(a)) = 1

WEIGHT(S) = min(WEIGHT(SIGN P))
When a member of an alternative is NULLABLE, it might
have an empty expansion while the alternative as a
whole is not empty. Bifurcate the alternative for
nullable member, including and excluding the nullable
members: S: SIGN P | P
= min(WEIGHT(SIGN)+WEIGHT(P),WEIGHT(P))
= min(1+1,1) = 1
WEIGHT(E) = min(WEIGHT(S),WEIGHT(E+S))
= min(WEIGHT(S)) = 1
(5) To compute FIRSTk(s), the first k terminals of string s,
Construct the sets F1 = {}, G1 = {PREFIXk(s)}
The PREFIXk(s) is the minimal prefix of s such that
WEIGHT(PREFIXk(s))>=k, or PREFIXk(s) = s
E: S | E + S
PREFIX2(S) = S
PREFIX2(E+S) = E+
S: SIGN P
Because NULLABLE(SIGN), again bifurcate
for nullable members
PREFIX2(SIGN P) = SIGN P | P
SIGN: + | - |
PREFIX2(+) = +
PREFIX2(-) = -
P: a | ( E )
PREFIX2(a) = a
PREFIX2((E)) = (E

Now having Fi and Gi, construct F[i+1] and G[i+1] by
If Gi is empty, Fi = FIRSTk(s) and stop.

Otherwise choose any string s in Gi.
If s is a terminal string,
F[i+1] = F[i] + {s}
G[i+1] = G[i] - {s}
If s=uXv, u is empty or terminals only,
X is the production X: x1|x2|....|xn
Let zr = PREFIXk(u xr v)
F[i+1] = F[i]
G[i+1] = G[i] + {zr: zr is not in any Gj, j<=i}

FIRST2(E)
1 F1 = {}; G1 = {PREFIX2(E)} = {E}
2 Take E from G1.
E: S | E+S
PREFIX2(S) = S
PREFIX2(E+S) = E+
G2 = {S,E+}
3 Take E+ from G2
E: S | E+S
PREFIX2(S+S) = S+
PREFIX2(E+S) = E+, already in G2
G3 = {S,S+}
4 Take S from G3
S: SIGN P
PREFIX2(SIGN P) = SIGN P | P
G4 = {SIGN P,P,S+}
5 take SIGN P from G4
SIGN: + | - |
PREFIX2(+P) = +P
PREFIX2(-P) = -P
G5 = {+P,-P,P,S+}
6 take S+ from G5
S: SIGN P
PREFIX2(SIGN P+) = SIGN P, already in G4
PREFIX2(P+) = P+
G6 = {+P,-P,P,P+}
7 take +P from G6
P: a | (E)
PREFIX2(+a) = +a
PREFIX2(+(E)) = +(
G7 = {+a,+(,-P,P,P+}
8 F8 = {+a}
G8 = {+(,-P,P,P+}
9 F9 = {+a,+(}
G9 = {-P,P,P+}
10 take -P from G9
P: a | (E)
PREFIX2(-a) = -a
PREFIX2(-(E)) = -(
G10 = {-a,-(,P,P+}
11 F11 = {+a,+(,-a}
G11 = {-(,P,P+}
12 F12 = {+a,+(,-a,-(}
G12 = {P,P+}
13 take P from G12
P: a | (E)
PREFIX2(a) = a
PREFIX2((E)) = (E
G13 = {a,(E,P+}
14 F14 = {+a,+(,-a,-(,a}
G14 = {(E,P+}
15 take P+ from G14
P: a | (E)
PREFIX2(a+) = a
PREFIX2((E)+) = (E, already in G14
G15 = {(E,a+}
16 take (E from G15
E : S | E+S
PREFIX2((S) = (S
PREFIX2((E+S) = (E, already in G14
G16 = {(S,a+}
17 take (S from G16
S : SIGN P
PREFIX2((SIGN P) = (SIGN | (P
G17 = {(SIGN,(P,a+}
18 take (SIGN P from G17
SIGN: +|-|
PREFIX2((+P) = (+
PREFIX2((-P) = (-
G18 = {(+,(-,(P,a+}
19 F19 = {+a,+(,-a,-(,a,(+}
G19 = {(-,(P,a+}
20 F20 = {+a,+(,-a,-(,a,(+,(-}
G20 = {(P,a+}
21 F21 = {+a,+(,-a,-(,a,(+,(-,a+}
G21 = {(P}
22 take (P from G21
P: a | (E)
PREFIX2((a) = (a
PREFIX2(((E)) = ((
G22 = {(a,((}
23 F23 = {+a,+(,-a,-(,a,(+,(-,a+,(a}
G23 = {((}
24 F24 = {+a,+(,-a,-(,a,(+,(-,a+,(a,((}
G24 = {}
25 stop

FIRST2(E) = +a | +( | -a | -( | a | (+ | (- | (a | (( | a+

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.

Post a followup to this message