41.2.1.3. Convert CFG to NPDA
NOTE: This is not the same construction method we used before.
This method will apply to any CFG, even those that are not in GNF.
Idea: To derive a string with a CFG, start with
the start symbol and repeatedly apply production
rules until the string is derived.
In order to simulate this process with an NPDA, start by pushing the
start symbol on the stack.
Whenever a production rule \(A \rightarrow w\) would be applied,
the variable \(A\) should be on top of the stack.
\(A\) is popped (or replaced) and the right hand side of the rule,
\(w\) , is pushed onto the stack.
Whenever a terminal is on top of the stack, if it matches the next
symbol in the input string, then it is popped from the stack.
If it does not match, then this string is not in the language of the
grammar.
If starting with the start symbol \(S\) , one can apply replacement
rules, match all the terminals in the input string and empty the
stack, then the string is in the language.
Note
Just mention this stuff and then draw NPDA.
The constructed NPDA:
Three states: \(s, q, f\)
As usual, start in state \(s\)
Push \(S\) on stack, move into \(q\)
All rewrite rules in state \(q\) :
If left-hand-side of rewrite rule on top of stack, replace it
with right-hand-side of rewrite rule and stay in state \(q\)
Additional rules in \(q\) to recognize terminals:
Read input symbol, pop input symbol, stay in state \(q\)
Pop \(z\) from stack, move into \(f\) , accept
Example 41.2.1
\(L = \{a^nbb^n: n \ge 0 \}\)
\(S \rightarrow aSb \mid b\)
Note that this is nondeterministic.
You have to use the lookahead to decide which transition to take,
in a sense adding determinism by using extra information.
A parsing routine for this grammar:
symbol
is the lookahead symbol and $ is the end-of-string marker:
state = s
push ( S )
state = q
read ( symbol ) obtain the lookahead symbol
while top - of - stack <> z do while stack is not empty
case top - of - stack of
S : if symbol == a then cases for variables
{ pop (); push ( aSb ) } replace S by aSb
else if symbol == b then
{ pop (); push ( b ) } replace S by b
else error
a : if symbol <> a , then error cases for terminals
else { pop (); read ( symbol ) } pop a , get next lookahead
b : if symbol <> b , then error
else { pop (); read ( symbol ) } pop b , get next lookahead
end case
end while
pop () pop z from the stack
if symbol <> $ then error
state = f
What are the drawbacks?
For a larger grammar, case statement can get quite long
Can put the case statement into a generic routine
41.2.1.5. A generic parsing routine
Idea: To replace a variable on the top of the stack with
its appropriate right-hand-side, use the lookahead
and the left-hand-side to look up the right-hand-side in the LL parse
table.
(LL[,]
is the parse table.):
push ( S )
read ( symbol ) obtain the lookahead symbol
while stack not empty do
case top - of - stack of
terminal :
if top - of - stack == symbol
then { pop (); read ( symbol ) } pop terminal and get next lookahead
else
error
variable :
if LL [ top - of - stack , symbol ] <> error
then { pop (), pop the lhs
push ( LL [ top - of - stack , symbol ]) } push the rhs
else
error
end case
end while
if symbol <> $ , then error
Note
For previous example, try the following traces:
Parse the string: aabbb
Parse the string: b
Example 41.2.3
\[ \begin{align}\begin{aligned}S \rightarrow aSb
S \rightarrow c\\\begin{split}\begin{array}{l||l|l|l|l}
&a&b&c&\$ \\ \hline \hline
S & aSb & \mbox{error} & c & \mbox{error} \\
\end{array}\end{split}\end{aligned}\end{align} \]
In this example, it is clear that when \(S\) is on the
stack and \(a\) is the lookahead, replace \(S\) by
\(aSb\) .
When \(S\) is on the stack and \(b\) is the lookahead,
there is an error, because there must be a \(c\) between the
\(a\) ‘s and \(b\) ‘s.
When \(S\) is on the stack and $ is the lookahead,
then there is an error, since \(S\) must be replaced by at
least one terminal.
When \(S\) is on the stack, and
\(c\) is the lookahead, then \(S\) should be replaced by
\(c\) .
Example 41.2.4
\[S \rightarrow Ac \mid Bc
A \rightarrow aAb \mid \lambda
B \rightarrow b\]
When the grammar has a \(\lambda\) -rule, it
can be difficult to compute parse tables.
In this example,
\(A\) can disappear (due to \(A \rightarrow \lambda\) ),
so when \(S\) is on the stack, it can be replaced by \(Ac\)
if either “a” or “c” are the lookahead, or it can be replaced
by \(Bc\) if “b” is the lookahead.
We will use the following functions FIRST and FOLLOW to aid in
computing the table.
41.2.1.6. To construct an LL parse table LL[rows,cols]
Note
Refresh memory as to what parse table is.
For each rule \(A \rightarrow w\)
For each a in FIRST(w)
add w to LL[A,a]
If \(\lambda\) is in FIRST(w)
add \(w\) to LL[A,b] for each \(b\) in FOLLOW(A)
where \(b \in T \cup \{\$\}\)
Each undefined entry is an error.
Example 41.2.5
\(S \rightarrow aSc \mid B\)
\(B \rightarrow b \mid \lambda\)
We have already calculated FIRST and FOLLOW for this Grammar:
\[\begin{split}\begin{array}{c|l|l}
& FIRST & FOLLOW\\ \hline \hline
S & a, b, \lambda & \$, c \\
B & b, \lambda & \$, c \\
\end{array}\end{split}\]
To Compute the LL Parse Table for this example:
For \(S \rightarrow aSc\) ,
\(\mbox{FIRST}(aSc) = \{a\}\) , so add \(aSc\) to
LL[S,a]
by step 1a.
For \(S \rightarrow B\) ,
\(\mbox{FIRST}(B) = \{b, \lambda \}\)
\(\mbox{FOLLOW}(S) = \{\$, c\}\)
By step 1a, add \(B\) to LL[S,b]
By step 1b, add \(B\) to LL[S,c]
and LL[S,$]
For \(B \rightarrow b\) ,
\(\mbox{FIRST}(b) = \{b\}\) , so by step 1a add \(b\) to LL[B,b]
For \(B \rightarrow \lambda\)
\(\mbox{FIRST}(\lambda) = \{ \lambda \}\) and
\(\mbox{FOLLOW}(B) = \{\$, c\}\) , so by step 1b
add \(\lambda\) to LL[B,c]
and add \(\lambda\)
to LL[B,$]
.
LL(1) Parse Table
\[\begin{split}\begin{array}{c||c|c|c|c}
& a & b & c & \$ \\ \hline \hline
S & aSc & B & B & B \\ \hline
B & \mbox{error} & b & \lambda & \lambda
\end{array}\end{split}\]
Parse string: \(aacc\)
\[\begin{split}\begin{array}{lcccccccc}
&&&&a \\
&&a&&S &S &B \\
&&S& S& c& c& c& c \\
\mbox{Stack:} & \underline{S} & \underline{c} & \underline{c} & \underline{c}
& \underline{c} & \underline{c} & \underline{c} & \underline{c} \\
\mbox{symbol:} & a & a & a' & a' & c & c& c& c' \\
\end{array}\end{split}\]
where \(a'\) is the second \(a\) in the string and symbol
is
the lookahead symbol.
This table is an LL(1) table because only 1 symbol of lookahead is needed.
Example 41.2.6
Trace \(aabcc\)
\[\begin{split}\begin{array}{lccccccccc}
&&&&a \\
&&a&&S &S &B & b\\
&&S& S& c& c& c& c & c \\
\mbox{Stack:} & \underline{S} & \underline{c} & \underline{c} & \underline{c}
& \underline{c} & \underline{c} & \underline{c} & \underline{c}
& \underline{c} \\
\mbox{symbol:} & a & a & a' & a' & b & b& b& c & c' \\
\end{array}\end{split}\]
where \(a'\) is the second \(a\) in the string and symbol
is the lookahead symbol.
This table is an LL(1) table because only 1 symbol of lookahead is needed.
Example 41.2.7
Construct Parse Table for:
\(L = \{a^nb^nca^mcb^m : n \ge 0, m \ge 0\}\)
\(S \rightarrow AcB\)
\(A \rightarrow aAb\)
\(A \rightarrow \lambda\)
\(B \rightarrow aBb\)
\(B \rightarrow c\)
\(\mbox{FIRST}(A) = \{a, \lambda\}\)
\(\mbox{FIRST}(S) = \{a, c\}\)
NOTE: \(\lambda\) is not in \(\mbox{FIRST}(S)\)
\(\mbox{FIRST}(B) = \{a, c\}\)
\(\mbox{FOLLOW}(A) = \{b, c\}\)
\(\mbox{FOLLOW}(S) = \{\$\}\)
\(\mbox{FOLLOW}(B) = \{b, \$\}\)
To compute the parse table:
For \(S \rightarrow AcB\) ,
\(\mbox{FIRST}(AcB) = \{a, c\}\) so add \(AcB\) to
LL[S,a]
and LL[S,c]
For \(A \rightarrow aAb\) ,
\(\mbox{FIRST}(aAb) = \{a\}\) so add \(aAb\) to LL[A,a]
For \(A \rightarrow \lambda\) ,
\(\mbox{FIRST}(\lambda) = \{\lambda\}\) and
\(\mbox{FOLLOW}(A) = \{b, c\}\) so add \(\lambda\)
to LL[A,b]
and LL[A,c]
For \(B \rightarrow aBb\) ,
\(\mbox{FIRST}(aBb) = \{a\}\) so add \(aBb\) to LL[B,a]
For \(B \rightarrow c\) ,
\(\mbox{FIRST}(c) = \{c\}\) so add \(c\) to LL[B,c]
All other entries are errors.
\[\begin{split}\begin{array}{c||c|c|c|c}
&a &b &c & \$ \\ \hline \hline
S &AcB &\mbox{error} &AcB &\mbox{error} \\ \hline
A &aAb &\lambda &\lambda &\mbox{error} \\ \hline
B &aBb &\mbox{error} &c &\mbox{error} \\
\end{array}\end{split}\]
parse string: \(abcacb\)
parse string: \(cc\)
parse string: \(abcab\) (not in language)
Example 41.2.8
\(L = \{a^nb^nca^mcb^m: n \ge 1, m \ge 1 \}\)
\(S \rightarrow AcB\)
\(A \rightarrow aAb\)
\(A \rightarrow ab\)
\(B \rightarrow aBb\)
\(B \rightarrow acb\)
\[\begin{split}\begin{array}{c|l|l}
& \mbox{FIRST} & \mbox{FOLLOW} \\ \hline \hline
S & a & \$ \\ \hline
A & a & c,b \\ \hline
B & a & b,\$ \\
\end{array}\end{split}\]
Note that FIRST and FOLLOW are quite easy to calculate since
there are no \(\lambda\) rules!
In this case, you don’t need FOLLOW to construct the parse table.
Try to construct LL(1) Parse table
\[\begin{split}\begin{array}{c||c|c|c|c}
&a &b &c & \$ \\ \hline \hline
S &AcB &\mbox{error} &\mbox{error} &\mbox{error} \\ \hline
A &aAb &\mbox{error} &\mbox{error} &\mbox{error} \\
&ab & & & \\ \hline
B &aBb &\mbox{error} &\mbox{error} &\mbox{error} \\
& acb & & & \\
\end{array}\end{split}\]
Note that you don`t know which rewrite rule to apply to replace
\(A\) and \(B\) with just one lookahead symbol.
\(A\) has two choices and both use a lookahead of ‘a’.
There are two entries in the LL(1) parse table for T[A,a]
.
Thus, there is no LL(1) parse table.
This means the grammar is not LL(1)!.
We will try to use 2 symbols of lookahead.
For example, the string \(aabbcaacbb\) cannot be parsed with
just one lookahead.
LL(2) Parse Table:
\[\begin{split}\begin{array}{c||c|c|c|c|c|c|c}
&aa &ab &ac & a\$ & b & c & \$ \\ \hline \hline
S &AcB &AcB & error & error &error &error &error \\ \hline
A &aAb &ab & error & error &error &error &error \\ \hline
B &aBb &error & acb & error &error &error &error \\
\end{array}\end{split}\]
There are no conflicts (only one rule in each entry of the table).
This is an LL(2) parser - need two lookahead symbols.
parse string: \(aabbcacb\)
\[\begin{split}\begin{array}{lcccccccccccc}
&&&a&&a \\
&&&A&A&b&b\\
&&A & b& b& b& b& b&&&a\\
&&c&c&c&c&c&c&c&&c&c\\
\mbox{Stack:} & \underline{S} & \underline{B}& \underline{B}& \underline{B}
& \underline{B}& \underline{B}& \underline{B}& \underline{B}& \underline{B}
& \underline{b} & \underline{b} & \underline{b} \\
\mbox{symbol:} & aa & aa &aa & ab & ab & bb & bc& ca& ac &ac & cb & b\$ \\
\end{array}\end{split}\]
Note the leftmost derivation!
Also note that the two lookahead symbols are used whenever there is
a variable on top of the stack.
An LL(k) parser needs \(k\) lookahead symbols.
Note
Mention that LL parser doesn’t work if the grammar is left recursive.
Example 41.2.9
\(L = \{a^n: n \ge 0 \} \cup \{a^nb^n: n \ge 0 \}\)
\(S \rightarrow A\)
\(S \rightarrow B\)
\(A \rightarrow aA\)
\(A \rightarrow \lambda\)
\(B \rightarrow aBb\)
\(B \rightarrow \lambda\)
This grammar cannot be recognized by an LL(k) parser for any
\(k\) !
Consider the string \(aabb\) .
You would need 3 lookahead to realize that you want to use
\(S \rightarrow B\) .
Consider the string \(aaabbb\) , you would need 4 lookahead.
Consider string \(a^nb^n\) , you would need \(n\) lookahead.
There is no (constant) \(k\) such that \(k\) lookahead
works for every string in the language.
Example 41.2.11
\(S \rightarrow bbCd \mid Bcc\)
\(B \rightarrow bB \mid b\)
\(C \rightarrow cC \mid c\)
This grammar is LL(5).
We don’t know which S rule to apply with the string \(bbccd\)
or \(bbcc\$\) until you have seen the fifth symbol.
This grammar cannot be recognized by an LL(k) parser.
When the lookahead is \(b\) , don’t know which rule to
apply, either the second or third.
Comments:
There are some CFL’s that have no LL(k) Parser
There are some languages for which some grammars have
LL(k) parsers and some don’t.