Definition: A linear grammar has at most one variable on the
right hand side of any production.
Thus, right linear and left linear grammars are also linear
grammars.
This grammar is not a linear grammar, as there is a choice of
which variable to replace.
To write an algorithm to perform replacements, we need some order.
We will see this when we look at parsing algorithms.
Derivations
Definition: Leftmost derivation: in each step of a derivation,
replace the leftmost variable. (See derivation 1 above.)
Definition: Rightmost derivation: in each step of a derivation,
replace the rightmost variable. (See derivation 2 above.)
Derivation Trees (also known as “parse trees”): A derivation tree
represents a derivation, but does not show the order in which
productions were applied.
Example
A derivation tree for \(G = (V, T, S, P)\):
Root is labeled \(S\)
Leaves are labeled \(x\), where \(x \in T \cup \{\lambda\}\)
Non-leaf vertices labeled \(A, A \in V\)
For rule \(A \rightarrow a_1a_2a_3\ldots a_n\), where
\(A \in V, a_i \in (T \cup V \cup \{\lambda\})\),
Example
\(G = (\{S, A, B\}, \{a, b, c\}, S, P)\)
\(S \rightarrow AcB\)
\(A \rightarrow aAa\ |\ \lambda\)
\(B \rightarrow Bbb\ |\ \lambda\)
Derivation trees do not denote the order variables are
replaced!
But we can get a leftmost or rightmost derivation from looking at
tree.
Definitions: Partial derivation tree - subtree of derivation tree.
If partial derivation tree has root \(S\) then it represents a
sentential form.
Leaves from left to right in a derivation tree form the yield of
the tree.
If \(w\) is the yield of a derivation tree, then it must be that
\(w \in L(G)\).
The yield for the example above is \(aacbb\).
Examples
A partial derivation tree that has root S:
The yield of this example is \(aAacB\) (which is a sentential form).
A partial derivation tree that does not have root S:
Membership problem (1)
Membership: Given CFG \(G\) and string \(w \in \Sigma^*\),
is \(w \in L(G)\)?
If we can find a derivation of \(w\), then we would know that
\(w\) is in \(L(G)\).
Motivation:
\(G\) is the grammar for Java.
\(w\) is your Java program.
Is \(w\) syntactically correct?
This is (part of) what a compiler does.
You write a program, you compile it, and the compiler finds all your
syntax mistakes.
(It also “translates” the program into “bytecode” to be
executed.
We won’t talk much about that aspect of compilers in this class.)
Example
\(G = (\{S\}, \{a, b\}, S, P), P =\)
\(S \rightarrow SS\ |\ aSa\ |\ b\ |\ \lambda\)
\(L_1 = L(G) = \{w \in \Sigma^* |\ \mbox{strings with an even number of a's}\}\)
Is \(abbab \in L(G)\)?
Exhaustive Search Algorithm
For all \(i = 1, 2, 3, \ldots\)
Examine all sentential forms yielded by \(i\) substitutions
Example of Derivation (1)
Is \(abbab \in L(G)\)?
\(i = 1\)
1. \(S \Rightarrow SS\)
2. \(S \Rightarrow aSa\)
3. \(S \Rightarrow b\)
4. \(S \Rightarrow \lambda\)
Example of Derivation (2)
\(i=2\)
1. \(S \Rightarrow SS \Rightarrow SSS\)
2. \(S \Rightarrow SS \Rightarrow aSaS\)
3. \(S \Rightarrow SS \Rightarrow bS\)
4. \(S \Rightarrow SS \Rightarrow S\)
5. \(S \Rightarrow aSa \Rightarrow aSSa\)
…
Note: Will we find \(w\)? How long will it take? If we just do leftmost
derivations, then for \(i = 2\), 8 of length 2.
When \(i = 6\) we will find the derivation of \(w\).
\(S \Rightarrow SS \Rightarrow aSaS \Rightarrow aSSaS \Rightarrow abSaS \Rightarrow abba \Rightarrow abbab\)
Derivation: Strings Not in Language
Question: What happens if \(w\) is not in \(L(G)\)?
When do we stop the loop in the algorithm and know for sure that
\(w\) is not going to be derived?
\(S \Rightarrow SS ... \Rightarrow SSSSSSSSSS ... \Rightarrow S\)
Hard to determine that \(baaba\) is not in \(L(G)\).
We want to consider special forms of context free grammars such that
we can determine when strings are or are not in the language.
Easy to write a context-free grammar and then convert it into
a special form such that it will be easier to test membership.
CFG Theorem (1)
Theorem: If CFG \(G\) does not contain rules of the form
If \(a = 2\) and \(b = 4\), then the “result” of this
expression is \((2+4)*2 = 12\).
Ambiguity Example (3)
There are two distinct derivation trees for the same string. Thus the
grammar is ambiguous. The string can have different meanings depending
on which way it is interpreted.
If \(G\) is a grammar for Java programs and \(w\) is Bob’s
Java program, he doesn’t want one compiler to give one meaning to
his program and another compiler to interpret his program
differently. Disaster!
Rewriting the Grammar (1)
Rewrite the grammar as an unambiguous grammar. (Specifically, with the
meaning that multiplication has higher precedence than addition.)
Try to get a derivation tree with the other meaning of \(a+b*c\), when
\(*\) is closer to the root of the tree.
\(E \Rightarrow T \Rightarrow T*F ...\)
Then the only way to include a “\(+\)”
before the multiplication is if the addition is enclosed in
parenthesis. Thus, there is only one meaning that is accepted.
Unambiguous Grammars
Definition: If \(L\) is CFL and \(G\) is an unambiguous
CFG such that \(L = L(G)\), then \(L\) is unambiguous.
<<Why are we studying CFL? Because we want to be able to represent
syntactically correct programs.>>
Backus-Naur Form of a grammar:
Nonterminals are enclosed in brackets \(<>\)
For “\(\rightarrow\)” use instead “\(::=\)”
Sample C++ Program::
main () {
int a; int b; int sum;
a = 40; b = 6; sum = a + b;
cout << "sum is "<< sum << endl;
}
Programming Language (1)
“Attempt” to write a CFG for C++ in BNF
(Note: \(<\mbox{program}>\) is start symbol of grammar: