|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| {{refimprove|date=February 2012}}
| | Msvcr71.dll is an significant file which assists help Windows procedure different components of the program including important files. Specifically, the file is selected to help run corresponding files in the "Virtual C Runtime Library". These files are important inside accessing any settings which support the different applications and programs in the program. The msvcr71.dll file fulfills many important functions; nonetheless it's not spared from getting damaged or corrupted. Once the file gets corrupted or damaged, the computer will have a hard time processing and reading components of the program. Nonetheless, consumers want not panic because this issue will be solved by following several procedures. And I may show we several tips regarding Msvcr71.dll.<br><br>Windows Defender - this does come standard with various Windows OS Machines, nevertheless otherwise could be download from Microsoft for free. It usually aid protect against spyware.<br><br>So, this advanced dual scan is not only 1 of the better, however it is actually equally freeware. And as of all this that numerous regard CCleaner 1 of the greater registry products inside the marketplace today. I would add which I personally prefer Regcure for the easy reason which it has a greater interface plus I learn for a fact which it is ad-ware without charge.<br><br>In purchase to remove the programs on your computer, Windows Installer should be in a healthy state. If its installation is corrupted we can obtain error 1721 in Windows 7, Vista plus XP during the program removal task. Simply re-registering its component files would resolve your issue.<br><br>Use a [http://bestregistrycleanerfix.com/regzooka regzooka]. This can look your Windows registry for three kinds of keys which really can hurt PC performance. These are: duplicate, lost, plus corrupted.<br><br>Software errors or hardware errors which happen whenever running Windows plus intermittent mistakes are the general reasons for a blue screen physical memory dump. New software or motorists which have been installed or changes in the registry settings are the typical s/w causes. Intermittent errors refer to failed program memory/ hard disk or over heated processor plus these too may cause the blue screen physical memory dump error.<br><br>When the registry is corrupt or full of errors, the signs may be felt by the computer owner. The slow performance, the frequent program crashes and the nightmare of all computer owners, the blue screen of death.<br><br>There is a lot a good registry cleaner will do for a computer. It will check for plus download updates for Windows, Java and Adobe. Keeping changes current is an important piece of superior computer health. It can also safeguard your personal and business privacy plus a online safety. |
| In [[formal language theory]], a '''context-free grammar''' ('''CFG''')
| |
| is a [[formal grammar]] in which every [[Production (computer science)|production rule]] is of the form
| |
| :''V'' → ''w''
| |
| where ''V'' is a ''single'' [[nonterminal]] symbol, and ''w'' is a string of [[Terminal and nonterminal symbols|terminal]]s and/or nonterminals (''w'' can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side can always be replaced by the right hand side.
| |
| | |
| [[formal language|Language]]s generated by context-free grammars are known as [[context-free language]]s (CFL). Different Context Free grammars can generate the same context free language. It is important to distinguish properties of the language (intrinsic properties) from properties of a particular grammar (extrinsic properties). Given two context free grammars, the [[#Language equality|language equality]] question (do they generate the same language?) is [[Decidability (logic)|undecidable]].
| |
| | |
| Context-free grammars are important in [[linguistics]] for describing the structure of sentences and words in [[natural language]], and in [[computer science]] for describing the structure of [[programming language]]s and other formal languages.
| |
| | |
| In [[linguistics]], some authors use the term '''[[phrase structure grammar]]''' to refer to context-free grammars, whereby phrase structure grammars are distinct from [[dependency grammar]]s. In [[computer science]], a popular notation for context-free grammars is [[Backus–Naur Form]], or ''BNF''.
| |
| | |
| == Background ==
| |
| | |
| Since the time of [[Pāṇini]], at least, linguists have described the [[grammar]]s of languages in terms of their [[block structure]], and described how sentences are [[recursion|recursively]] built up from smaller phrases, and eventually individual words or word elements. An essential property of these block structures is that logical units never overlap. For example, the sentence:
| |
| : John, whose blue car was in the garage, walked to the grocery store.
| |
| can be logically parenthesized as follows:
| |
| : (John, ((whose blue car) (was (in the garage))), (walked (to (the grocery store)))).
| |
| A context-free grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the "block structure" of sentences in a natural way. Its simplicity makes the formalism amenable to rigorous mathematical study. Important features of natural language syntax such as [[agreement (linguistics)|agreement]] and [[reference]] are not part of the context-free grammar, but the basic recursive structure of sentences, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are swallowed by nouns and verbs, is described exactly.
| |
| | |
| The formalism of context-free grammars was developed in the mid-1950s by [[Noam Chomsky]],<ref name="hu106">{{harvtxt|Hopcroft|Ullman|1979}}, p. 106.</ref> and also their [[Chomsky hierarchy|classification as a special type]] of [[formal grammar]] (which he called [[phrase-structure grammar]]s).<ref name="chomsky1956">{{citation
| |
| | last = Chomsky | first = Noam | authorlink =
| |
| | title = Three models for the description of language
| |
| | journal = Information Theory, IEEE Transactions
| |
| | volume = 2 | issue = 3 | pages = 113–124
| |
| | publisher = | date = Sep 1956
| |
| | url = http://ieeexplore.ieee.org/iel5/18/22738/01056813.pdf?isnumber=22738&prod=STD&arnumber=1056813&arnumber=1056813&arSt=+113&ared=+124&arAuthor=+Chomsky%2C+N.
| |
| | archiveurl = http://www.chomsky.info/articles/195609--.pdf
| |
| | archivedate = 2013-10-18
| |
| | doi = 10.1109/TIT.1956.1056813| id = | accessdate = 2007-06-18}}</ref> What Chomsky called a phrase structure grammar is also known now as a constituency grammar, whereby constituency grammars stand in contrast to [[dependency grammar]]s. In Chomsky's [[generative grammar]] framework, the syntax of natural language was described by context-free rules combined with transformation rules.
| |
| | |
| Block structure was introduced into computer [[programming language]]s by the [[Algol (programming language)|Algol]] project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting Algol syntax. This became a standard feature of computer languages, and the notation for grammars used in concrete descriptions of computer languages came to be known as [[Backus-Naur Form]], after two members of the Algol language design committee.<ref name="hu106"/> The "block structure" aspect that context-free grammars capture is so fundamental to grammar that the terms syntax and grammar are often identified with context-free grammar rules, especially in computer science. Formal constraints not captured by the grammar are then considered to be part of the "semantics" of the language.
| |
| | |
| Context-free grammars are simple enough to allow the construction of efficient [[list of algorithms#Parsing|parsing algorithm]]s which, for a given string, determine whether and how it can be generated from the grammar. An [[Earley parser]] is an example of such an algorithm, while the widely used [[LR parser|LR]] and [[LL parser]]s are simpler algorithms that deal only with more restrictive subsets of context-free grammars.
| |
| | |
| == Formal definitions ==
| |
| A context-free grammar ''G'' is defined by the 4-[[tuple]]:<ref>The notation here is that of {{harvtxt|Sipser|1997}}, p. 94. {{harvtxt|Hopcroft|Ullman|1979}} (p. 79) define context-free grammars as 4-tuples in the same way, but with different variable names.</ref>
| |
| | |
| <math>G = (V\,, \Sigma\,, R\,, S\,)</math>
| |
| where
| |
| # <math>V\, </math> is a finite set; each element <math> v\in V</math> is called ''a non-terminal character'' or a ''variable''. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by <math>G\, </math>.
| |
| # <math>\Sigma\,</math> is a finite set of ''terminal''s, disjoint from <math>V\,</math>, which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar <math>G\, </math>.
| |
| # <math>R\,</math> is a finite relation from <math>V\,</math> to <math>(V\cup\Sigma)^{*}</math>, where the asterisk represents the [[Kleene star]] operation. The members of <math>R\,</math> are called the ''(rewrite) rule''s or ''production''s of the grammar. (also commonly symbolized by a <math>P\,</math>)
| |
| # <math>S\,</math> is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of <math>V\,</math>.
| |
| | |
| === Production rule notation ===
| |
| A [[Formal grammar#The syntax of grammars|production rule]] in <math>R\,</math> is formalized mathematically as a pair <math>(\alpha, \beta)\in R</math>, where <math>\alpha \in V</math> is a non-terminal and <math>\beta \in (V\cup\Sigma)^{*}</math> is a [[string (computer science)|string]] of variables and/or terminals; rather than using [[ordered pair]] notation, production rules are usually written using an arrow operator with <math>\alpha</math> as its left hand side and <math>\beta</math> as its right hand side:
| |
| <math>\alpha\rightarrow\beta</math>.
| |
| | |
| It is allowed for <math>\beta</math> to be the [[empty string]], and in this case it is customary to denote it by ε. The form <math>\alpha\rightarrow\varepsilon</math> is called an ε-production.<ref>{{harvtxt|Hopcroft|Ullman|1979}}, pp. 90–92.</ref>
| |
| | |
| It is common to list all right-hand sides for the same left-hand side on the same line, using | (the [[pipe symbol]]) to separate them. Rules <math>\alpha\rightarrow \beta_1</math> and <math>\alpha\rightarrow\beta_2</math> can hence be written as <math>\alpha\rightarrow\beta_1\mid\beta_2</math>.
| |
| | |
| === Rule application ===
| |
| For any strings <math>u, v\in (V\cup\Sigma)^{*}</math>, we say <math>u\,</math> directly yields <math>v\,</math>, written as <math>u\Rightarrow v\,</math>, if <math>\exists (\alpha, \beta)\in R</math> with <math>\alpha \in V</math> and <math>u_{1}, u_{2}\in (V\cup\Sigma)^{*}</math> such that <math>u\,=u_{1}\alpha u_{2}</math> and <math>v\,=u_{1}\beta u_{2}</math>. Thus, <math>\! v</math> is the result of applying the rule <math>\! (\alpha, \beta)</math> to <math>\! u</math>.
| |
| | |
| === Repetitive rule application ===
| |
| For any <math>u, v\in (V\cup\Sigma)^{*}, </math> we say <math>u</math> yields <math>v</math> written as <math>u\stackrel{*}{\Rightarrow} v</math> (or <math>u\Rightarrow\Rightarrow v\,</math> in some textbooks), if <math>\exists \ u_{1}, u_{2}, \cdots u_{k}\in (V\cup\Sigma)^{*}, k\geq 0</math> such that <math>u\Rightarrow u_{1}\Rightarrow u_{2}\cdots\Rightarrow u_{k}\Rightarrow v</math>
| |
| | |
| === Context-free language ===
| |
| The language of a grammar <math>G = (V\,, \Sigma\,, R\,, S\,)</math> is the set
| |
| :<math>L(G) = \{ w\in\Sigma^{*} : S\stackrel{*}{\Rightarrow} w\}</math>
| |
| | |
| A language <math>L\,</math> is said to be a context-free language (CFL), if there exists a CFG <math>G\,</math>, such that <math>L\,=\,L(G)</math>.
| |
| | |
| === Proper CFGs ===
| |
| A context-free grammar is said to be ''proper'', if it has
| |
| * no ''inaccessible'' symbols: <math>\forall N \in V: \exists \alpha,\beta \in (V\cup\Sigma)^*: S \stackrel{*}{\Rightarrow} \alpha{N}\beta</math>
| |
| * no ''unproductive'' symbols: <math>\forall N \in V: \exists w \in \Sigma^*: N \stackrel{*}{\Rightarrow} w</math>
| |
| * no ε-productions: <math>\forall N \in V, w \in \Sigma^*: (N, w) \in R \Rightarrow w \neq</math> ε <br> (the right-arrow in this case denotes logical "implies" and not grammatical "yields")
| |
| * no cycles: <math>\neg\exists N \in V: N \stackrel{*}{\Rightarrow} N</math>
| |
| | |
| === Example ===
| |
| The grammar <math>G = (\{S\} , \{a, b\}, P, S)</math>, with productions
| |
|
| |
| :S → aSa,
| |
| :S → bSb,
| |
| :S → ε,
| |
|
| |
| is context-free. It is not proper since it includes an ε-production. A typical derivation in this grammar is
| |
| :S → aSa → aaSaa → aabSbaa → aabbaa.
| |
| This makes it clear that
| |
| <math>L(G) = \{ww^R:w\in\{a,b\}^*\}</math>. | |
| The language is context-free, however it can be proved that it is not [[Regular language|regular]].
| |
| | |
| == Examples ==
| |
| {{unreferenced section|date=June 2013}}
| |
| === Well-formed parentheses ===
| |
| | |
| The canonical example of a context free grammar is parenthesis matching, which is representative of the general case. There are two terminal symbols "(" and ")" and one nonterminal symbol S. The production rules are
| |
| | |
| :S → SS
| |
| :S → (S)
| |
| :S → ()
| |
| | |
| The first rule allows Ss to multiply; the second rule allows Ss to become enclosed by matching parentheses; and the third rule terminates the recursion.
| |
| | |
| === Well-formed nested parentheses and square brackets ===
| |
| | |
| A second canonical example is two different kinds of matching nested parentheses, described by the productions:
| |
| | |
| :S → SS
| |
| :S → ()
| |
| :S → (S)
| |
| :S → []
| |
| :S → [S]
| |
| | |
| with terminal symbols [ ] ( ) and nonterminal S.
| |
| | |
| The following sequence can be derived in that grammar:
| |
| : ([ [ [ ()() [ ][ ] ] ]([ ]) ])
| |
| | |
| However, there is no context-free grammar for generating all sequences of two different types of parentheses, each separately balanced disregarding the other, but where the two types need not nest inside one another, for example:
| |
| | |
| : [ ( ] )
| |
| | |
| or
| |
| | |
| : [ [ [ [(((( ] ] ] ]))))(([ ))(([ ))([ )( ])( ])( ])
| |
| | |
| === A regular grammar ===
| |
| Every regular grammar is context-free, but not all context-free grammars are regular. The following context-free grammar, however, is also regular.
| |
| | |
| :S → a
| |
| :S → aS
| |
| :S → bS
| |
| | |
| The terminals here are ''a'' and ''b'', while the only non-terminal is S.
| |
| The language described is all nonempty strings of <math>a</math>s and <math>b</math>s that end in <math>a</math>.
| |
| | |
| This grammar is [[regular grammar|regular]]: no rule has more than one nonterminal in its right-hand side, and each of these nonterminals is at the same end of the right-hand side.
| |
| | |
| Every regular grammar corresponds directly to a [[nondeterministic finite automaton]], so we know that this is a [[regular language]].
| |
| | |
| Using pipe symbols, the grammar above can be described more tersely as follows:
| |
| | |
| :S → a | aS | bS
| |
| | |
| === Matching pairs ===
| |
| In a context-free grammar, we can pair up characters the way we do with [[bracket]]s. The simplest example:
| |
| | |
| :S → aSb
| |
| :S → ab
| |
| | |
| This grammar generates the language <math> \{ a^n b^n : n \ge 1 \} </math>, which is not [[regular language|regular]] (according to the [[pumping lemma for regular languages]]).
| |
| | |
| The special character ε stands for the empty string. By changing the above grammar to
| |
| :S → aSb | ε
| |
| we obtain a grammar generating the language <math> \{ a^n b^n : n \ge 0 \} </math> instead. This differs only in that it contains the empty string while the original grammar did not.
| |
| | |
| === Algebraic expressions ===
| |
| Here is a context-free grammar for syntactically correct [[Infix notation|infix]] algebraic expressions in the variables x, y and z:
| |
| # S → x
| |
| # S → y
| |
| # S → z
| |
| # S → S + S
| |
| # S → S - S
| |
| # S → S * S
| |
| # S → S / S
| |
| # S → ( S )
| |
| | |
| This grammar can, for example, generate the string
| |
| | |
| :( x + y ) * x - z * y / ( x + x )
| |
| | |
| as follows:
| |
| | |
| :S (the start symbol)
| |
| : → S - S (by rule 5)
| |
| : → S * S - S (by rule 6, applied to the leftmost S)
| |
| : → S * S - S / S (by rule 7, applied to the rightmost S)
| |
| : → ( S ) * S - S / S (by rule 8, applied to the leftmost S)
| |
| : → ( S ) * S - S / ( S ) (by rule 8, applied to the rightmost S)
| |
| : → ( S + S ) * S - S / ( S ) (etc.)
| |
| : → ( S + S ) * S - S * S / ( S )
| |
| : → ( S + S ) * S - S * S / ( S + S )
| |
| : → ( x + S ) * S - S * S / ( S + S )
| |
| : → ( x + y ) * S - S * S / ( S + S )
| |
| : → ( x + y ) * x - S * y / ( S + S )
| |
| : → ( x + y ) * x - S * y / ( x + S )
| |
| : → ( x + y ) * x - z * y / ( x + S )
| |
| : → ( x + y ) * x - z * y / ( x + x )
| |
| | |
| Note that many choices were made underway as to which rewrite was going to be performed next.
| |
| These choices look quite arbitrary. As a matter of fact, they are, in the sense that the string finally generated is always the same. For example, the second and third rewrites
| |
| | |
| : → S * S - S (by rule 6, applied to the leftmost S)
| |
| : → S * S - S / S (by rule 7, applied to the rightmost S)
| |
| | |
| could be done in the opposite order:
| |
| | |
| : → S - S / S (by rule 7, applied to the rightmost S)
| |
| : → S * S - S / S (by rule 6, applied to the leftmost S)
| |
| | |
| Also, many choices were made on which rule to apply to each selected <code>S</code>.
| |
| Changing the choices made and not only the order they were made in usually affects which terminal string comes out at the end.
| |
| | |
| Let's look at this in more detail. Consider the [[parse tree]] of this derivation:
| |
|
| |
| <code language="asciitree"> | |
| | |
| S
| |
| |
| |
| /|\
| |
| S - S
| |
| / \
| |
| /|\ /|\
| |
| S * S S / S
| |
| / | | \
| |
| /|\ x /|\ /|\
| |
| ( S ) S * S ( S )
| |
| / | | \
| |
| /|\ z y /|\
| |
| S + S S + S
| |
| | | | |
| |
| x y x x
| |
| | |
| </code> | |
| | |
| Starting at the top, step by step, an S in the tree is expanded, until no more unexpanded <code>S</code>es (non-terminals) remain.
| |
| Picking a different order of expansion will produce a different derivation, but the same parse tree.
| |
| The parse tree will only change if we pick a different rule to apply at some position in the tree.
| |
| | |
| But can a different parse tree still produce the same terminal string,
| |
| which is <code>( x + y ) * x - z * y / ( x + x )</code> in this case?
| |
| Yes, for this particular grammar, this is possible.
| |
| Grammars with this property are called [[ambiguous grammar|ambiguous]].
| |
| | |
| For example, <code>x + y * z</code> can be produced with these two different parse trees:
| |
|
| |
| <code language="asciitree">
| |
| | |
| S S
| |
| | |
| |
| /|\ /|\
| |
| S * S S + S
| |
| / \ / \
| |
| /|\ z x /|\
| |
| S + S S * S
| |
| | | | |
| |
| x y y z
| |
| | |
| </code>
| |
| | |
| However, the ''language'' described by this grammar is not inherently ambiguous:
| |
| an alternative, unambiguous grammar can be given for the language, for example:
| |
| | |
| :T → x
| |
| :T → y
| |
| :T → z
| |
| :S → S + T
| |
| :S → S - T
| |
| :S → S * T
| |
| :S → S / T
| |
| :T → ( S )
| |
| :S → T
| |
| | |
| (once again picking <code>S</code> as the start symbol). This alternative grammar will produce <code>x + y * z</code> with a parse tree similar to the left one above, i.e. implicitly assuming the association <code>(x + y) * z</code>, which is not according to standard operator precedence. More elaborate, unambiguous and context-free grammars can be constructed that produce parse trees that obey all desired [[Operator-precedence parser|operator precedence]] and associativity rules.
| |
| | |
| === Further examples ===
| |
| | |
| ==== Example 1 ====
| |
| A context-free grammar for the language consisting of all strings over {a,b} containing an unequal number of a's and b's:
| |
| :S → U | V
| |
| :U → TaU | TaT
| |
| :V → TbV | TbT
| |
| :T → aTbT | bTaT | ε
| |
| Here, the nonterminal T can generate all strings with the same number of a's as b's, the nonterminal U generates all strings with more a's than b's and the nonterminal V generates all strings with fewer a's than b's.
| |
| | |
| ==== Example 2 ====
| |
| Another example of a non-regular language is <math> \{ b^n a^m b^{2n} : n \ge 0, m \ge 0 \} </math>. It is context-free as it can be generated by the following context-free grammar:
| |
| :S → bSbb | A
| |
| :A → aA | ε
| |
| | |
| ==== Other examples ====
| |
| The [[First-order_logic#Formation_rules|formation rules]] for the terms and formulas of formal logic fit the definition of context-free grammar, except that the set of symbols may be infinite and there may be more than one start symbol.
| |
| | |
| === Derivations and syntax trees ===<!-- This section is linked from [[LR parser]] -->
| |
| | |
| A ''derivation'' of a string for a grammar is a sequence of grammar rule applications that transforms the start symbol into the string.
| |
| A derivation proves that the string belongs to the grammar's language.
| |
| | |
| A derivation is fully determined by giving, for each step:
| |
| * the rule applied in that step
| |
| * the occurrence of its left hand side to which it is applied
| |
| For clarity, the intermediate string is usually given as well.
| |
| | |
| For instance, with the grammar:
| |
| | |
| (1) S → S + S
| |
| (2) S → 1
| |
| (3) S → a
| |
| | |
| the string
| |
| | |
| 1 + 1 + a
| |
| | |
| can be derived with the derivation:
| |
| | |
| S
| |
| → (rule 1 on first S)
| |
| S+S
| |
| → (rule 1 on second S)
| |
| S+S+S
| |
| → (rule 2 on second S)
| |
| S+1+S
| |
| → (rule 3 on third S)
| |
| S+1+a
| |
| → (rule 2 on first S)
| |
| 1+1+a
| |
| | |
| Often, a strategy is followed that deterministically determines the next nonterminal to rewrite:
| |
| * in a ''leftmost derivation'', it is always the leftmost nonterminal;
| |
| * in a ''rightmost derivation'', it is always the rightmost nonterminal.
| |
| Given such a strategy, a derivation is completely determined by the sequence of rules applied. For instance, the leftmost derivation
| |
| | |
| S
| |
| → (rule 1 on first S)
| |
| S+S
| |
| → (rule 2 on first S)
| |
| 1+S
| |
| → (rule 1 on first S)
| |
| 1+S+S
| |
| → (rule 2 on first S)
| |
| 1+1+S
| |
| → (rule 3 on first S)
| |
| 1+1+a
| |
| | |
| can be summarized as
| |
| | |
| rule 1, rule 2, rule 1, rule 2, rule 3
| |
| | |
| The distinction between leftmost derivation and rightmost derivation is important because in most [[parsing|parser]]s the transformation of the input is defined by giving a piece of code for every grammar rule that is executed whenever the rule is applied. Therefore it is important to know whether the parser determines a leftmost or a rightmost derivation because this determines the order in which the pieces of code will be executed. See for an example [[LL parser]]s and [[LR parser]]s.
| |
| | |
| A derivation also imposes in some sense a hierarchical structure on the string that is derived. For example, if the string "1 + 1 + a" is derived according to the leftmost derivation:
| |
| | |
| :S → S + S (1)
| |
| : → 1 + S (2)
| |
| : → 1 + S + S (1)
| |
| : → 1 + 1 + S (2)
| |
| : → 1 + 1 + a (3)
| |
| | |
| the structure of the string would be:
| |
| | |
| : { { 1 }<sub>S</sub> + { { 1 }<sub>S</sub> + { a }<sub>S</sub> }<sub>S</sub> }<sub>S</sub>
| |
| where { ... }<sub>S</sub> indicates a substring recognized as belonging to S. This hierarchy can also be seen as a tree:
| |
| | |
| <code language="asciitree">
| |
| S
| |
| /|\
| |
| / | \
| |
| / | \
| |
| S '+' S
| |
| | /|\
| |
| | / | \
| |
| '1' S '+' S
| |
| | |
| |
| '1' 'a'
| |
| </code>
| |
| | |
| This tree is called a ''[[parse tree]]'' or "concrete syntax tree" of the string, by contrast with the [[abstract syntax tree]]. In this case the presented leftmost and the rightmost derivations define the same parse tree; however, there is another (rightmost) derivation of the same string
| |
| | |
| :S → S + S (1)
| |
| : → S + a (3)
| |
| : → S + S + a (1)
| |
| : → S + 1 + a (2)
| |
| : → 1 + 1 + a (2)
| |
| | |
| and this defines the following parse tree:
| |
| | |
| <code language="asciitree">
| |
| S
| |
| /|\
| |
| / | \
| |
| / | \
| |
| S '+' S
| |
| /|\ |
| |
| / | \ |
| |
| S '+' S 'a'
| |
| | |
| |
| '1' '1'
| |
| </code>
| |
| | |
| If, for certain strings in the language of the grammar, there is more than one parsing tree, then the grammar is said to be an ''[[ambiguous grammar]]''. Such grammars are usually hard to parse because the parser cannot always decide which grammar rule it has to apply. Usually, ambiguity is a feature of the grammar, not the language, and an unambiguous grammar can be found that generates the same context-free language. However, there are certain languages that can only be generated by ambiguous grammars; such languages are called ''[[inherently ambiguous language]]s''.
| |
| | |
| == Normal forms ==
| |
| Every context-free grammar that does not generate the empty string can be transformed into one in which no rule has the empty string as a product [a rule with ε as a product is called an ε-production]. If it does generate the empty string, it will be necessary to include the rule <math>S \rarr \epsilon</math>, but there need be no other ε-rule. Every context-free grammar with no ε-production has an equivalent grammar in [[Chomsky normal form]] or [[Greibach normal form]]. "Equivalent" here means that the two grammars generate the same language.
| |
| | |
| Because of the especially simple form of production rules in Chomsky Normal Form grammars, this normal form has both theoretical and practical implications. For instance, given a context-free grammar, one can use the Chomsky Normal Form to construct a [[polynomial-time]] algorithm that decides whether a given string is in the language represented by that grammar or not (the [[CYK algorithm]]).
| |
| | |
| == Undecidable problems ==
| |
| Some questions that are undecidable for wider classes of grammars become decidable for context-free grammars; e.g. the emptiness problem (whether the grammar generates any terminal strings at all), is undecidable for [[context-sensitive grammar]]s, but decidable for context-free grammars.
| |
| | |
| Still, many problems remain undecidable. Examples are:
| |
| | |
| === Universality ===
| |
| | |
| Given a CFG, does it generate the language of all strings over the alphabet of terminal symbols used in its rules?<ref name="sipser">{{harvtxt|Sipser|1997}}, Theorem 5.10, p. 181.</ref><ref name="hu281"/>
| |
| | |
| A reduction can be demonstrated to this problem from the well-known undecidable problem of determining whether a [[Turing machine]] accepts a particular input (the [[Halting problem]]). The reduction uses the concept of a ''[[computation history]]'', a string describing an entire computation of a [[Turing machine]]. A CFG can be constructed that generates all strings that are not accepting computation histories for a particular Turing machine on a particular input, and thus it will accept all strings only if the machine doesn't accept that input.
| |
| | |
| === Language equality ===
| |
| Given two CFGs, do they generate the same language?<ref name="hu281">{{harvtxt|Hopcroft|Ullman|1979}}, p. 281.</ref><ref name="eom"/>
| |
| | |
| The undecidability of this problem is a direct consequence of the previous: it is impossible to even decide whether a CFG is equivalent to the trivial CFG defining the language of all strings.
| |
| | |
| === Language inclusion ===
| |
| Given two CFGs, can the first one generate all strings that the second one can generate?<ref name="hu281"/><ref name="eom"/>
| |
| | |
| If this problem was decidable, then language equality could be decided, too: two CFGs G1 and G2 generate the same language if L(G1) is a subset of L(G2) and L(G2) is a subset of L(G1).
| |
| | |
| === Being in a lower or higher level of the Chomsky hierarchy ===
| |
| | |
| Using [[Greibach's theorem]], it can be shown that the two following problems are undecidable:
| |
| | |
| * Given a [[context-sensitive grammar]], does it describe a context-free language?
| |
| * Given a context-free grammar, does it describe a [[regular language]]?<ref name="hu281"/><ref name="eom">{{citation|title=Encyclopaedia of mathematics: an updated and annotated translation of the Soviet "Mathematical Encyclopaedia"|first=Michiel|last=Hazewinkel|publisher=Springer|year=1994|isbn=978-1-55608-003-6|at=Vol. IV, p. 56|url=http://books.google.com/books?id=s9F71NJxwzoC&pg=PA56}}.</ref>
| |
| | |
| === Grammar ambiguity ===
| |
| Given a CFG, is it [[Ambiguous grammar|ambiguous]]?
| |
| | |
| The undecidability of this problem follows from the fact that if an algorithm to determine ambiguity existed, the [[Post correspondence problem]] could be decided, which is known to be undecidable.
| |
| | |
| === Language disjointness ===
| |
| | |
| Given two CFGs, is there any string derivable from both grammars?
| |
| | |
| If this problem was decidable, the undecidable [[Post correspondence problem]] could be decided, too: given strings <math>\alpha_1, \ldots, \alpha_N, \beta_1, \ldots, \beta_N</math> over some alphabet <math>\{a_1, \ldots, a_k\}</math>, let the grammar G1 consist of the rule
| |
| :S → <math>\alpha_1</math> S <math>\beta_1^{rev}</math> | ... | <math>\alpha_N</math> S <math>\beta_N^{rev}</math> | <math>b</math>;
| |
| where <math>\beta_i^{rev}</math> denotes the [[String_(computer_science)#Reversal|reversed]] string <math>\beta_i</math> and <math>b</math> doesn't occur among the <math>a_i</math>; and let grammar G2 consist of the rule
| |
| :T → <math>a_1</math> T <math>a_1</math> | ... | <math>a_k</math> T <math>a_k</math> | <math>b</math>;
| |
| Then the Post problem given by <math>\alpha_1, \ldots, \alpha_N, \beta_1, \ldots, \beta_N</math> has a solution if and only if L(G1) and L(G2) share a derivable string.
| |
| | |
| == Extensions ==
| |
| An obvious way to extend the context-free grammar formalism is to allow nonterminals to have arguments, the values of which are passed along within the rules. This allows natural language features such as [[Agreement (linguistics)|agreement]] and [[reference]], and programming language analogs such as the correct use and definition of identifiers, to be expressed in a natural way. E.g. we can now easily express that in English sentences, the subject and verb must agree in number. In computer science, examples of this approach include [[affix grammar]]s, [[attribute grammar]]s, [[indexed grammar]]s, and Van Wijngaarden [[two-level grammar]]s. Similar extensions exist in linguistics.
| |
| | |
| An '''extended context-free grammar''' is one in which the right-hand side of the production rules is allowed to be a [[regular expression]] over the grammar's terminals and nonterminals. Extended context-free grammars describe exactly the context-free languages.<ref>{{cite web | url=http://www.engr.mun.ca/~theo/Courses/fm/pub/context-free.pdf | title=A Short Introduction to Regular Expressions and Context-Free Grammars | accessdate=August 24, 2012 | author=Norvell, Theodore | pages=4}}</ref>
| |
| | |
| Another extension is to allow additional terminal symbols to appear at the left hand side of rules, constraining their application. This produces the formalism of [[context-sensitive grammar]]s.
| |
| | |
| == Subclasses ==
| |
| | |
| There are a number of important subclasses of the context-free grammars:
| |
| | |
| * [[LR parser|LR(''k'')]] grammars (also known as [[deterministic context-free grammar]]s) allow [[parsing]] (string recognition) with [[deterministic pushdown automaton|deterministic pushdown automata]], but they can only describe [[deterministic context-free language]]s.
| |
| | |
| * [[SLR grammar|Simple LR]], [[LALR parser|Look-Ahead LR]] grammars are subclasses that allow further simplification of parsing.
| |
| | |
| * [[LL parser|LL(''k'') and LL(''*'')]] grammars allow parsing by direct construction of a leftmost derivation as described above, and describe even fewer languages.
| |
| | |
| * [[Simple grammar]]s are a subclass of the LL(1) grammars mostly interesting for its theoretical property that language equality of simple grammars is decidable, while language inclusion is not.
| |
| | |
| * [[Bracketed grammar]]s have the property that the terminal symbols are divided into left and right bracket pairs that always match up in rules.
| |
| | |
| * [[Linear grammar]]s have no rules with more than one nonterminal in the right hand side.
| |
| | |
| * [[Regular grammar]]s are a subclass of the linear grammars and describe the [[regular language|regular]] languages, i.e. they correspond to [[finite automaton|finite automata]] and [[regular expression]]s.
| |
| | |
| LR parsing extends LL parsing to support a larger range of grammars; in turn, [[GLR parser|generalized LR parsing]] extends LR parsing to support arbitrary context-free grammars. On LL grammars and LR grammars, it essentially performs LL parsing and LR parsing, respectively, while on nondeterministic grammars, it is as efficient as can be expected. Although GLR parsing was developed in the 1980s, many new language definitions and [[parser generator]]s continue to be based on LL, LALR or LR parsing up to the present day.
| |
| | |
| ==Linguistic applications==
| |
| | |
| [[Noam Chomsky|Chomsky]] initially hoped to overcome the limitations of context-free grammars by adding [[transformational grammar|transformation rules]].<ref name="chomsky1956"/>
| |
| | |
| Such rules are another standard device in traditional linguistics; e.g. [[grammatical voice|passivization]] in English. Much of [[generative grammar]] has been devoted to finding ways of refining the descriptive mechanisms of phrase-structure grammar and transformation rules such that exactly the kinds of things can be expressed that natural language actually allows. Allowing arbitrary transformations doesn't meet that goal: they are much too powerful, being [[Turing complete]] unless significant restrictions are added (e.g. no transformations that introduce and then rewrite symbols in a context-free fashion).
| |
| | |
| Chomsky's general position regarding the non-context-freeness of natural language has held up since then,<ref name="shieber1985">{{citation | title=Evidence against the context-freeness of natural language | year=1985 | last=Shieber | first=Stuart | journal=Linguistics and Philosophy | volume=8 | pages=333–343 | url=http://www.eecs.harvard.edu/~shieber/Biblio/Papers/shieber85.pdf | doi=10.1007/BF00630917 | issue=3}}.</ref> although his specific examples regarding the inadequacy of context-free grammars in terms of their weak generative capacity were later disproved.<ref name="pullum-gazdar1982">{{citation | title=Natural languages and context-free languages | year=1982 | last=Pullum | first=Geoffrey K. | coauthors=Gerald Gazdar | journal=Linguistics and Philosophy | volume=4 | pages=471–504 | doi=10.1007/BF00360802 | issue=4}}.</ref>
| |
| [[Gerald Gazdar]] and [[Geoffrey Pullum]] have argued that despite a few non-context-free constructions in natural language (such as [[cross-serial dependencies]] in [[Swiss German]]<ref name="shieber1985"/> and [[reduplication]] in [[Bambara language|Bambara]]<ref name="culy1985">{{citation | title=The Complexity of the Vocabulary of Bambara | year=1985 | last=Culy | first=Christopher | journal=Linguistics and Philosophy | volume=8 | pages=345–351 | doi=10.1007/BF00630918 | issue=3}}.</ref>), the vast majority of forms in natural language are indeed context-free.<ref name="pullum-gazdar1982"/>
| |
| | |
| == See also ==
| |
| * [[Parsing expression grammar]]
| |
| * [[Stochastic context-free grammar]]
| |
| * [[Context-free grammar generation algorithms|Algorithms for context-free grammar generation]]
| |
| * [[Pumping lemma for context-free languages]]
| |
| | |
| === Parsing algorithms ===
| |
| * [[CYK algorithm]]
| |
| * [[GLR parser]]
| |
| * [[LL parser]]
| |
| * [[Earley algorithm]]
| |
| | |
| ==Notes==
| |
| {{Reflist|colwidth=30em}}
| |
| | |
| ==References==
| |
| *{{citation|first1=John E.|last1=Hopcroft|author1-link=John Hopcroft|first2=Jeffrey D.|last2=Ullman|author2-link=Jeffrey Ullman|title=Introduction to Automata Theory, Languages, and Computation|publisher=Addison-Wesley|year=1979}}. Chapter 4: Context-Free Grammars, pp. 77–106; Chapter 6: Properties of Context-Free Languages, pp. 125–137.
| |
| *{{citation|authorlink = Michael Sipser | first = Michael | last = Sipser | year = 1997 | title = Introduction to the Theory of Computation | publisher = PWS Publishing | isbn = 0-534-94728-X}}. Chapter 2: Context-Free Grammars, pp. 91–122; Section 4.1.2: Decidable problems concerning context-free languages, pp. 156–159; Section 5.1.1: Reductions via computation histories: pp. 176–183.
| |
| * {{cite book| author=J. Berstel, L. Boasson| title=Context-Free Languages| year=1990| volume=B| pages=59–102| publisher=Elsevier| editor=Jan van Leeuwen| series=Handbook of Theoretical Computer Science}}
| |
| | |
| {{Formal languages and grammars}}
| |
| | |
| {{DEFAULTSORT:Context-Free Grammar}}
| |
| [[Category:1956 in computer science]]
| |
| [[Category:Compiler construction]]
| |
| [[Category:Formal languages]]
| |
| [[Category:Programming language topics]]
| |
| [[Category:Wikipedia articles with ASCII art]]
| |