A dependency parser may analyze syntax and build a data structure (e.g., often some kind of parse tree, abstract syntax tree or other hierarchical structure) implicit in the input tokens. Many modern parsers are at least partly statistical and rely on a corpus of training data that has already been annotated (e.g., parsed by hand), such as a Treebank. This approach allows the parser to gather information about the frequency with which various constructions occur in specific contexts and to build an inductive statistical model that allows the parser to create (e.g., induce, propose, hypothesize, etc.) grammatical structures (parses) from previously unseen sentences.
The speed and accuracy of dependency parsers render them useful for downstream natural language processing tasks. These tasks include, but are not limited to, work on question answering, sentiment analysis, machine translation reordering, etc. Such downstream processing tasks may pertain to special applications whose requirements may differ in some ways from colloquial applications. For example, areas with specific jargon (e.g., medicine, patent law, engineering, etc.) may require different parses of a given sentence than the most correct generic parse. For example, the word “chocolate” may require a parse that translates it into the word for confectionary chocolate in another language when translating a generic document, but the same word may require a parse that translates it into the equivalent of “dark brown” when translating a document specific to the color trades, such as painters, dyers, clothiers, etc.
Examples of parsers include graph based parsers, transition based parsers, chart parsers, etc., or a combination thereof. A graph based parser can be a type of parser which may generate a parser model which may rank associated dependency graphs and subsequently search for dependency graphs with the most desirable ranking A transition based parser may rank transitions between parser states based on the parse history and subsequently search for the highest-scoring transition sequences that derive a complete dependency graph. Transition based parsers rely on machine learning to induce a model for predicting the transition sequence used by the parser to construct the dependency graph. A chart parser includes a type of parser suitable for ambiguous grammars, including grammars of natural languages. It may use the dynamic programming approach wherein partial hypothesized results may be stored in a structure called a chart and can be re-used. In accordance with embodiments of the disclosed subject matter, a chart parser may use the Cocke-Younger-Kasami (CYK) algorithm. The CYK algorithm considers every possible subsequence of the sequence of words and sets a series of Booleans, P[i,j,k], to be true if the subsequence of words starting from i of length j can be generated from a non-terminal symbol in a grammar, Rk. Once it has considered subsequences of length 1, it may go on to subsequences of length 2, and so on. For subsequences of length 2 and greater, it may consider every possible partition of the subsequence into two parts, and determine if there is some production P→Q R such that Q matches the first part and R matches the second part. Accordingly, it may record P as matching the whole subsequence. Once this process is completed, the sentence may be recognized by the grammar if the subsequence containing the entire sentence is matched by the start symbol.
Parser data can include parser training data and parser model parameters. Parser training data can include a set of <sentence, reference parse tree> pairs, <word, reference word> pairs, etc. Parser model parameters can include a set of statistical parameters that the parser can use to score candidate parses, e.g., to compute an intrinsic parser metric for a candidate parse. These parameters can be trained (modified) using parser training data. For example, in the baseline parser, the likelihood of “red” being labeled as an adjective given that it is followed by “car” might be 0.2. But after retraining, the likelihood may increase, say, to 0.7. The parser may then be better at parsing the specific sentence, “the red car is faster than the blue car,” but any sentence containing “red car”, “blue car” and the other grammatical constructions in the specific sentence that are also present in other sentences. Parser data can be modified in other ways. For example, parse trees can be reordered, dependency statistics may be changed, etc. The effect of such modifications can include increasing the likelihood that a subsequent parse is more likely to reflect at least some of the properties of one or more elements of a training set. Examples of parser data can include phrases, training data, weighting factors, phrase tables, properties of the words, information about the syntactic structure of the phrase (such as dependencies), the grammar, etc., or a combination thereof. A phrase can include any number of words, numbers, characters, punctuation or other such entities or combination thereof. Within the parser, a phrase or phrases can be associated with structures and/or additional information (e.g., attributes, etc.) such as hierarchies, rules, parse trees, part-of-speech tags, counts, probabilities, semantic categories, etc., or combination thereof.