Processing math: 100%
Close
Close Window

OpenDSA Stand-alone Modules

Chapter 0 modules

Show Source |    | About   «  0.272. Context-Free Languages   ::   Contents   ::   0.274. Pushdown Automata  »

Transforming Grammars

1. Transforming Grammars

We use grammars to represent a programming language. Want to know: Is a given string (or program x) valid (syntactically correct)? Same as asking if it is in the language (the membership problem).

Last time we showed that if we could transform a CFG into a CFG with no λ-productions, and no rules like AB, then we could determine if w is in or not in L(G) in 2|w| rounds, each step adding a terminal or increasing in length.

This works, but it is not fast, that is, not linear!

We will look at lots of methods for transforming grammars. Some will be forms that are easier to work with, some are easier to use in proofs.

Key question: Are there ways to transform (restrict) CFGs such that
1) We can process efficiently
2) without restricting the power of CFGs

Note

Ask yourself: What does it mean “without restricting the power of CFGs”?

Specifically, we will look at restrictions on the right hand side of the production rules. We want to be able to automatically transform an arbitrary CFG into an equivalent restricted CFG.

We will consider CFL without λ. It would be easy to add λ to any grammar by adding a new start symbol S0,

S0Sλ

Substitution Theorem Let G be a CFG. Suppose G contains

Ax1Bx2

where xi(VT), A and B are different variables, and B has the productions

By1|y2||yn.

Then, we can construct G from G by deleting

Ax1Bx2

from P and adding to it

Ax1y1x2|x1y2x2||x1ynx2.

Then, L(G)=L(G).

Question: Why don’t we also delete B rules?

Answer: These might be used by another production

Question: What if A and B are the same?

SaBabecomesSaaSaaaaBaSaBaSa

Then the B productions become useless productions.

Definition: A production of the form AAx, AV,x(VT) is left recursive.

Example: Previous expression grammar was left recursive. (from Chapter 5 notes)

It is left recursive because it has SOME left recursive productions. Not all of the productions are left recursive.

EE+TT
TTFF
FI(E)
Iab

A top-down parser (like LL parsing) would want to derive the leftmost terminal as soon as possible. But in the left recursive grammar above, in order to derive a sentential form that has the leftmost terminal, we have to derive a sentential form that has other terminals in it.

Derivation of a+b+a+a is:

EE+TE+T+TE+T+T+Ta+T+T+T

NOTE: nicer to have a grammar that would derive the a first without all the other +’s.

We will eliminate the left recursion so that we can derive a sentential form with the leftmost terminal and no other terminals.

Theorem (Removing Left recursion) Let G=(V,T,S,P) be a CFG. Divide productions for variable A into left-recursive and non left-recursive productions:

AAx1Ax2Axn
Ay1y2ym

where xi, yi are in (VT).

Any derivation will start with yi.

A is not a prefix of any of the second type.

Then G=(V{Z},T,S,P) and P replaces rules of form above by

AyiyiZ,i=1,2,,m
ZxixiZ,i=1,2,,n

Note

Consider a derivation: y3x7x1x3

Example 1

EE+TTbecomesETTZZ+T+TZTTFFbecomesTFFYYFFY

When you get rid of left-recursion, the grammar is in the appropriate form for a top-down parser, but the grammar has more variables and productions.

Now, the derivation of a+b+a+a is:

ETZFZIZaZ

Didn’t have to look at any other terminals yet!

Useless productions

SaBbA
AaA
BSa
CcBca

What can you say about this grammar?

A, S, and B are useless variables since they can’t derive a string of terminals. C is useless because you can’t get SxCyw, where wT.

Theorem: (useless productions) Let G be a CFG. Then  G that does not contain any useless variables or productions such that L(G)=L(G).

To Remove Useless Productions:

Let G=(V,T,S,P).

I. Compute V1= {Variables that can derive strings of terminals}
1. V1=
2. Repeat until no more variables added
* For every AV with Ax1x2xn, xi(TV1), add A to V1
3. P1= all productions in P with symbols in (V1T)

Then G1=(V1,T,S,P1) has no variables that can’t derive strings.

Now we need to get rid of productions we can’t use.

  1. Draw Variable Dependency Graph

For AxBy, draw AB.

Draw A in a circle, B in a circle, and an arc from A to B.

Remove productions for V if there is no path from S to V in the dependency graph. Resulting Grammar G is such that L(G)=L(G) and G has no useless productions.

Example 2

SaBbA
AaA
BSab
CcBca
DbCb
EAab

V1={B,C,D,E,S}, A is useless.

G1:    SaBBSaaCcBca

Dependency graph:

uselessgraph
G:    SaBBSab
1 / 17 Settings
<<<>>>

Suppose we need to remove the useless productions for the following grammar.

  1. S
  2. aB
  1. S
  2. bA
  1. A
  2. aA
  1. B
  2. Sa
  1. B
  2. b
  1. C
  2. cBc
  1. C
  2. a
  1. D
  2. bCb
  1. E
  2. Aa
  1. E
  2. b
Proficient Saving... Error Saving
Server Error
Resubmit

Q: How would you implement II? How do you know which nodes are accessible from S? Use DFS or BFS.

NOTE: Last time talked about simpler CFG that had no λ-productions, now we will show how to get rid of them.

Theorem (remove λ productions) Let G be a CFG with λ not in L(G). Then a CFG G having no λ-productions such that L(G)=L(G).

To Remove λ -productions

1. Let Vn={A production Aλ}
2. Repeat until no more additions
* if BA1A2Am and AiVn for all i, then put B in Vn
THUS, Vn={AAλ}
3. Construct G with productions P such that
* If Ax1x2xmP,m1, then put all productions formed when xj is replaced by λ (for all xjVn) such that |rhs|1 into P.

Example 3

SAb
ABCBAa
Bbλ
CcCλ

Vn={B,C,A}

G:     SAbbABCBBCBBCBBCAaaBbCcCc

NOTE: Don’t add Aλ!

1 / 10 Settings
<<<>>>

Suppose we need to remove $\lambda$ productions form the following grammar

  1. S
  2. AcB
  1. A
  2. aAa
  1. A
  2. λ
  1. B
  2. Bbb
  1. B
  2. λ
Proficient Saving... Error Saving
Server Error
Resubmit

Definition: Unit Production

AB

where A,BV.

Consider removing unit productions:

Suppose we have

AB     becomes     AaabBaab

But what if we have

AB     becomes     ACBCBACACB

But we don’t get rid of unit-productions!

Theorem (Remove unit productions) Let G=(V,T,S,P) be a CFG without λ-productions. Then CFG G=(V,T,S,P) that does not have any unit-productions and L(G)=L(G).

To Remove Unit Productions:

1. Find for each A, all B such that AB
(Draw a dependency graph howing relationship of Unit productions. Just draw arc for each AB rule.
Draw A in a circle, B in a circle, and an arc from A to B.)
2. Construct G=(V,T,S,P) by
(a) Put all non-unit productions in P
(b) For all AB such that By1y2ynP, put Ay1y2ynP
Run DFS with A as root.
Note the star in AB
Never put a unit production in P.

Example 4

SAB
AB
BCBb
CAcDa
DA
unitgraph
After a)SABBBbCcDa
G:SABABbcDaBBbcDaCcBbDaDcBbDa
1 / 13 Settings
<<<>>>

To remove unit productions, we need to identify all unit productions using a dependency graph.

  1. S
  2. AB
  1. A
  2. B
  1. B
  2. C
  1. B
  2. Bb
  1. C
  2. A
  1. C
  2. c
  1. C
  2. Da
  1. D
  2. A
Proficient Saving... Error Saving
Server Error
Resubmit

Theorem: Let L be a CFL that does not contain λ. Then a CFG for L that does not have any useless productions, λ-productions, or unit-productions.

Proof:

1. Remove λ-productions
2. Remove unit-productions
3. Remove useless productions

Note order is very important. Removing λ-productions can create unit-productions! QED.

There are additional examples in the book.

Definition: A CFG is in Chomsky Normal Form (CNF) if all productions are of the form

ABC  orAa

where A,B,CV and aT.

Why would you want to put a grammar in this form? Because it is easier to work with in proofs.

Theorem 1 :

Any CFG G with λ not in L(G) has an equivalent grammar in CNF.

Proof:

1. Remove λ-productions, unit productions, and useless productions.
2. For every right-hand-side of length >1, replace each terminal xi by a new variable Cj and add the production Cjxi.
Note: All productions are in the correct form or the right-hand-side is a string of variables.
3. Replace every right-hand-side of length >2 by a series of productions, each with right-hand-side of length 2. QED.

Example 5

SCBcd
Bb
CCce
(after step 1)G:SCBC1C2BbCCC3eC1cC2dC3c(after step 2)G:SCZ1Z1BZ2Z2C1C2BbCCC3eC1cC2dC3c

NOTE: Can get rid of λ-productions and unit productions first!

1 / 13 Settings
<<<>>>

To remove unit productions, we need to identify all unit productions using a dependency graph.

  1. S
  2. CBcd
  1. B
  2. b
  1. C
  2. Cc
  1. C
  2. e
Proficient Saving... Error Saving
Server Error
Resubmit

Definition: A CFG is in Greibach normal form (GNF) if all productions have the form

Aax

where aT and xV

This is like an s-grammar (or simple grammar), except the s-grammar definition includes a further restriction that any pair (A,a) can occur at most in one rule.

This is so that you wouldn’t have to backtrack (only one choice to match the derivation of a string). So it very restrictive.

Theorem 2

For every CFG G with λ not in L(G), a grammar in GNF.

Proof:

1. Rewrite grammar in CNF.
2. Relabel Variables A1,A2,An
3. Eliminate left recursion and use substitution to get all productions into the form:
AiAjxj,j>i
ZiAjxj,jn
Aiaxi
where aT,xiV, and Zi are new variables introduced for left recursion.
Use Theorems 6.1 and 6.2 to get rid of left recursion.
4. All productions with An are in the correct form, Anaxn. Use these productions as substitutions to get An1 productions in the correct form. Repeat with An2, An3, etc until all productions are in the correct form.

WHAT YOU SHOULD KNOW: know forms, GNF, CNF, unit production, left recursion, etc. Do not need to memorize rules for transforming, but should understand how to do it.

   «  0.272. Context-Free Languages   ::   Contents   ::   0.274. Pushdown Automata  »

nsf
Close Window