This disclosure pertains to a technique for inducing and applying a context free grammar. In general, a context free grammar (CFG) is described by a tuple G=(V, Σ, R, S). V defines a set of non-terminal symbols (NTs) that identify different syntactic categories. S corresponds to a particular kind of non-terminal symbol associated with a sentence as a whole. E defines a set of terminals that identify the actual words in a sentence. R defines a set of rules, each having the form NT→γ, where NT corresponds to any non-terminal symbol and γ corresponds to any combination of non-terminal symbols and terminals. For example, a top-level rule may indicate that S→NP VP, which indicates that a sentence is produced by a combination of a noun phrase (NP) and a verb phrase (VP). Other rules may specify permissible constructions of noun phrases and verb phrases, and so on. Each rule may also have a probability value associated with it, in which case the CFG corresponds to probabilistic CFG or PCFG.
Overall, a CFG generates a language L. The language L corresponds to the set of all sentences that can be expressed using the CFG. For example, a CFG may specify the rules used to construct any sentence in the English language. A developer designs this grammar with the aim of inclusiveness—that is, with the intent of encompassing every possible grammatical construct that is found in the English language, and ideally excluding all those sequences of words that do not correspond to grammatical English constructions.
A parser may be run on a sentence to indicate whether the sentence conforms to the rules specified in a particular CFG. A sentence which conforms to the rules is said to be grammatical with respect to the CFG. Otherwise, the sentence is said to ungrammatical. When operating in this role, the parser may be referred to as a recognizer.