.. _Grammars1:
.. raw:: html
.. |--| unicode:: U+2013 .. en dash
.. |---| unicode:: U+2014 .. em dash, trimming surrounding whitespace
:trim:
.. This file is part of the OpenDSA eTextbook project. See
.. http://algoviz.org/OpenDSA for more details.
.. Copyright (c) 2012-13 by the OpenDSA Project Contributors, and
.. distributed under an MIT open source license.
.. avmetadata::
:author: David Furcy and Tom Naps
.. odsalink:: AV/PL/AV/parseTree.css
.. odsalink:: AV/PL/main.css
=================
Grammars - Part 1
=================
.. (M 2/1/16)
RP 1 part 1
-----------
Topics in this module:
1. Terminology of grammars
2. Derivations and parse trees
Grammars provide a formalism for expressing the syntax of programming
languages. That syntax is consequently used to parse, that is,
determine the correctness, of a "program" in the language. A grammar
is composed of the following three elements.
* A set of terminals. These terminals represent the tokens --
characters, or groups of characters that logically belong
together, such as operator symbols, delimiters, keywords, variable
names -- that ultimately comprise the program or expression being
parsed. In the case of algebraic expressions, the terminals would
be variables, numeric constants, parentheses, and the various
operators that are allowed.
* A set of non-terminals. These non-terminals represent the various
grammatical constructs within the language we are parsing. In
particular, one non-terminal is designated as the start symbol for
the grammar.
* A set of productions. The productions are formal rules defining
the syntactical composition of the non-terminals from the
previous point. The productions take the form:
.. math::
\begin{eqnarray*}
&::=& String \; of \; terminals \; and/or \; nonterminals\\
\end{eqnarray*}
We say that the non-terminal on the left of such a production *derives* the string on the right.
An example of a context-free grammar should help to clarify this three-part definition. By convention the non-terminal on the left of the first production is the start symbol, and that is what ultimately must be parsed to have a complete expression in the language. Hence in the example below, :math:`` is the start symbol.
.. _eg1:
Example Grammar 1
^^^^^^^^^^^^^^^^^
.. math::
\begin{eqnarray*}
&::=& \\
&|& + \\
&|& - \\
&|& * \\
&|& / \\
&::=& \\
&|& ( ) \\
&:==& A | B | C | \ldots | X | Y | Z
\end{eqnarray*}
This is essentially a grammar for algebraic expressions with variables
(that is, the :math:`` non-terminal) allowed to be a single upper-case
letter. When reading a grammar, the vertical bar :math:`|` means
"or". Hence :math:`` can be A or B or C ... The :math:``
non-terminal must either be a :math:`` or a parenthesized
:math:``. A derivation of the expression :math:`A + B * C`
according to this grammar proceeds as illustrated in the following
slide show, with the final result being a *parse tree*. You should step
through all the slides, making sure that at each step you understand
the production that is being applied to "grow" the parse tree.
.. inlineav:: parseTree4 ss
:output: show
Note that, in a complete parse tree, leaf nodes are always terminals,
and a traversal of the tree that would output these leaf nodes
would reproduce the expression being parsed. This is indicated by the red
highlighting in the above slide-show.
The following set of four review problems for this module should be completed before you go on. In these review problems, the symbol :math:`\epsilon` is used to represent the *empty production*. When :math:`\epsilon` appears on the right of a production, it means that one of the possibilities for the non-terminal on the left side of the production is for it to derive the empty string, that is, the string with no characters. This is typically used when the syntax for the language being parsed allows the option of the non-terminal not appearing at all. Often with productions that are recursive, it provides a way for the recursion to bottom out -- similar to the way a recursive termination condition would work in a recursive algorithm.
The first problem is about building a parse tree given a grammar and a string.
.. avembed:: Exercises/PL/RP1part1.html ka
:module: Grammars1
:long_name: RP set #1, question #1
:points: 1.0
:required: True
:threshold: 1.0
:exer_opts: JXOP-debug=true&JOP-lang=en&JXOP-code=java_generic
RP 1 part 2
-----------
This problem is about determining whether a given string can be
generated by a given grammar.
.. avembed:: Exercises/PL/RP1part2.html ka
:module: Grammars1
:long_name: RP set #1, question #2
:points: 1.0
:required: True
:threshold: 1.0
:exer_opts: JXOP-debug=true&JOP-lang=en&JXOP-code=java_generic
RP 1 part 3
-----------
This problem is about identifying properties of all of the strings in
a language defined by a given grammar.
.. avembed:: Exercises/PL/RP1part3.html ka
:module: Grammars1
:long_name: RP set #1, question #3
:points: 1.0
:required: True
:threshold: 1.0
:exer_opts: JXOP-debug=true&JOP-lang=en&JXOP-code=java_generic
RP 1 part 4
-----------
This problem is about precisely characterizing the whole language
generated by a given grammar.
.. avembed:: Exercises/PL/RP1part4.html ka
:module: Grammars1
:long_name: RP set #1, question #4
:points: 1.0
:required: True
:threshold: 1.0
:exer_opts: JXOP-debug=true&JOP-lang=en&JXOP-code=java_generic
.. odsascript:: AV/PL/AV/parseTree4.js