Loading [MathJax]/jax/output/HTML-CSS/jax.js

8.Parsing Introduction§

Parsing: Deciding if xΣ is in L(G) for some CFG G.

Consider the CFG G:
SAa
AAAABaλ
BBBabλ
Is ba in L(G)? Running time?
How do you determine whether a string is in L(G)?
Note: ba is not in L(G) for this G!
Try all possible derivations, but don’t know when to stop. This runs forever!

Introduction Example (2)

Same grammar without lambda-rules:
Remove λ-rules, then unit productions, and then useless productions from the grammar G above. New grammar G is:
SAaa
AAAABaAaBaa
BBBaBaab
Is ba in L(G)? Running time?
Try all possible derivations, there will be at most |w| rounds. NOTE THIS IS NOT LINEAR TIME, IT TAKES A LONG TIME. Actual time is |w|p where p is the maximum number of rules for any variable.

Introduction Example (3)

Grammar:
SAaa
AAAABaAaBaa
BBBaBaab
Consider string baa.
Goal: We would like to only try the rules that give us the derivation and ignore false paths. This would be fast!
SAaBaabaa

Top-down Parser

Bottom-up Parser

Making Parse Tables

We want to construct simple tables that tell us:
When you are working on this rule, and see this input, do something
We will use the functions FIRST and FOLLOW to aid in computing parse tables.
Notation that we will use in defining FIRST and FOLLOW.
G=(V,T,S,P)
w,v(VT)
aT
X,A,BV
XI(VT)+

The function FIRST

Definition: FIRST(w)= the set of terminals that begin strings derived from w.
If wav then
a is in FIRST(w)
If wλ then
λ is in FIRST(w)

To compute FIRST (1)

1. FIRST(a)={a} where a is a terminal.
2. FIRST(X) where X is a variable.
(a) If Xaw then
a is in FIRST(X)
(b) If Xλ then
λ is in FIRST(X)
(c) If XAw and λFIRST(A) then
Everything in FIRST(w) is in FIRST(X)

To compute FIRST (2)

3. In general, FIRST(X1X2X3...XK)=
* FIRST(X1)
*  FIRST(X2) if λ is in FIRST(X1)
*  FIRST(X3) if λ is in FIRST(X1)
and λ is in FIRST(X2)
*  FIRST(XK) if λ is in FIRST(X1)
and λ is in FIRST(X2)
… and λ is in FIRST(XK1)
*  {λ} if λFIRST(XJ) for all J
(where XI represents a terminal or a variable)

To compute FIRST (3)

We will be computing FIRST(w) where w is the right hand side of a rule.
Thus, we will need to compute FIRST(X) for each symbol X (either terminal or variable) that appears in the right hand side of a rule.

Example (1)

L={anbmcn:n0,0m1}
SaScB
Bbλ
FIRST(B)={b,λ}
Using Bb gives that b is in FIRST(B).
Using Bλ gives that λ is in FIRST(B).
FIRST(S)={a,b,λ}
Using SaSc gives that a is in FIRST(S).
Using SB and λ is in FIRST(B) gives that everything in FIRST(B) is in FIRST(S), so b and λ are in FIRST(S).
FIRST(Sc)={a,b,c}

Example (2a)

SBCDaD
ACEBaA
Bbλ
CdBλ
DcAλ
EefE

Example (2b)

FIRST(B)=

FIRST(C)=

FIRST(D)=

FIRST(E)=

FIRST(A)=

FIRST(S)=

The function FOLLOW

Definition: FOLLOW(X)= set of terminals that can appear to the right of X in some derivation.
(We only compute FOLLOW for variables.)
If SwAav then
a is in FOLLOW(A)
(where w and v are strings of terminals and variables, a is a terminal, and A is a variable)

Computing FOLLOW

  1. $ is in FOLLOW(S)

  2. If AwBv and vλ then

    FIRST(v){λ} is in FOLLOW(B)

  3. If AwB or AwBv and λ is in FIRST(v) then

    FOLLOW(A) is in FOLLOW(B)

  4. λ is never in FOLLOW

Example (1)

SaScB
Bbλ

Reminder: λ is never in a FOLLOW set.

FOLLOW(S)={$,c}

$ goes into FOLLOW(S) by rule 1. Then c goes into FOLLOW(S) by rule 2 since SaSc and FIRST(c)={c}.

FOLLOW(B)={$,c}

By rule 3 and SB, FOLLOW(S) is added to FOLLOW(B).

Example (2a)

SBCDaD
ACEBaA
Bbλ
CdBλ
DcAλ
EefE

Example (2b)

FOLLOW(S)=

FOLLOW(A)=

FOLLOW(B)=

FOLLOW(C)=

FOLLOW(D)=

FOLLOW(E)=

LL(k) Parsing

We discussed this in principle before. Now we want to operationalize it.
Note: A language is not LL(k), a grammar is.
L={aiabcii>0}
G1=SaSc{aaa}
Saabc{aab}
G2=SaA
ASc{aa}
Aabc{ab}
G3=SaaAc
AaAc{a}
Ab{b}

LL parsing process

First, convert CFG to PDA
L={anbbn:n0}
SaSbb
lt10pda1
The PDA is nondeterministic.
Use lookahead to make it deterministic: determine which rewrite rule to use.

Parsing routine (for this grammar)

symbol is the lookahead symbol and $ is the end-of-string marker:

state = s
push(S)
state = q
read(symbol)                               obtain the lookahead symbol
while top-of-stack <> z do                 while stack is not empty
   case top-of-stack of
   S: if symbol == a then                  cases for variables
            { pop(); push(aSb) }           replace S by aSb
         else if symbol == b then
            { pop(); push(b) }             replace S by b
         else error
      a: if symbol <> a, then error        cases for terminals
         else { pop(); read(symbol) }      pop a, get next lookahead
      b: if symbol <> b, then error
         else { pop(); read(symbol) }      pop b, get next lookahead
      end case
end while
pop()                                      pop z from the stack
if symbol <> $ then error
state = f

LL Parse Table

When the grammar is large, the parsing routine will have many cases.
Alternatively, store the information for which rule to apply in a table.
Rows: variables
Columns: terminals, $ (end of string marker)
LL[i,j] contains the right-hand-side of a rule. This right-hand-side is pushed onto the stack when the left-hand-side of the rule is the variable representing the i th row and the lookahead is the symbol representing the j th column.
For any CFG that we can specify by this type of parse table, we can use a generic parser to determine if strings are in this language.
Gets rid of use of states

Parse Table Example

Parse table for

L={anbbn:n0}
SaSbb
ab$SaSbberror
Example strings:

aabbb

b

A generic parsing routine

(LL[,] is the parse table.):

push(S)
read(symbol)                                         obtain the lookahead symbol
while stack not empty do
   case top-of-stack of
      terminal:
         if top-of-stack == symbol
            then { pop(); read(symbol) }             pop terminal and get next lookahead
         else
            error
      variable:
         if LL[top-of-stack, symbol] <> error
            then { pop(),                            pop the lhs
                   push(LL[top-of-stack,symbol]) }   push the rhs
            else
               error
      end case
end while
if symbol <> $, then error

Example

SaSb
Sc
abc$SaSberrorcerror
When S is on the stack and a is the lookahead, replace S by aSb
When S is on the stack and b is the lookahead, there is an error (there must be a c between the a ‘s and b ‘s)
When S is on the stack and $ is the lookahead, then there is an error (S must be replaced by at least one terminal)
When S is on the stack, and c is the lookahead, then S should be replaced by c.

Example

SAcBc
AaAbλ
Bb
When the grammar has a λ-rule, it can be difficult to compute parse tables.
In this example, A can disappear (due to Aλ).
So when S is on the stack, it can be replaced by Ac if either “a” or “c” are the lookahead, or it can be replaced by Bc if “b” is the lookahead.

Constructing an LL parse table

1. For each rule Aw
a. For each a in FIRST(w)
add w to LL[A,a]
b. If λ is in FIRST(w)
add w to LL[A,b] for each b in FOLLOW(A)
where bT{$}
2. Each undefined entry is an error.

Example (1): Need FIRST and FOLLOW

SaScB
Bbλ

We have already calculated FIRST and FOLLOW for this Grammar:

FIRSTFOLLOWSa,b,λ$,cBb,λ$,c

Example (2): Compute Parse Table

For SaSc, FIRST(aSc)={a}, so add aSc to LL[S,a] by step 1a.
For SB,
FIRST(B)={b,λ}
FOLLOW(S)={$,c}
By step 1a, add B to LL[S,b]
By step 1b, add B to LL[S,c] and LL[S,$]
For Bb, FIRST(b)={b}, so by step 1a add b to LL[B,b]
For Bλ FIRST(λ)={λ} and FOLLOW(B)={$,c},
so by step 1b add λ to LL[B,c] and add λ to LL[B,$].

Example (3): Sample Trace

abc$SaScBBBBerrorbλλ

Parse string: aacc

aaSSBSSccccStack:S_c_c_c_c_c_c_c_symbol:aaaacccc

where a is the second a in the string and symbol is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.

Example (4): Sample Trace

Trace aabcc

aaSSBbSScccccStack:S_c_c_c_c_c_c_c_c_symbol:aaaabbbcc

where a is the second a in the string and symbol is the lookahead symbol. This table is an LL(1) table because only 1 symbol of lookahead is needed.

LL(k) Can’t Parse All CFGs

L={an:n0}{anbn:n0}
SA
SB
AaA
Aλ
BaBb
Bλ
This grammar cannot be recognized by an LL(k) parser for any k.
Consider the string aabb. Need 3 lookahead to realize that we want SB.
For aaabbb, we need 4 lookahead.
For anbn, we need n lookahead.

Conclusion

There are some CFL’s that have no LL(k) Parser

There are some languages for which some grammars have LL(k) parsers and some don’t.