Automatic machine translators that translate phrases from one language to another often include parsers. A parser can be a component of a machine translator that analyzes syntax and builds a data structure (e.g., often some kind of parse tree, abstract syntax tree or other hierarchical structure) implicit in the input tokens, such as elements of a source language to be translated into a target language. Many modern parsers are at least partly statistical and rely on a corpus of training data that has already been annotated (parsed by hand), such as a Treebank. Other training data can be un-annotated target phrases that are known to be good translations of a given source phrase. When the parser processes training data, it allows the parser to gather information about the frequency with which various constructions occur in specific contexts and to build an inductive statistical model that allows the parser to create (e.g., induce, propose, hypothesize, etc.) grammatical structures (parses) from previously unseen sentences.
The parser's statistical parameters can be improved by comparing its own, machine-generated candidate parse of a source phrase to a reference target in another language that is a known, good translation of the same phrase. The differences between a machine-generated candidate parse and a reference can be used to adjust the statistical parameters of the parser to generate better outputs for subsequent source phrases.
The quality of the parses generated by a statistical parser can depend upon the extent to which it has been trained with good training data. Hand-annotated training data can be expensive to produce. However, un-annotated training data that includes known good translations of source phrases can be less expensive and still be effective in improving parser parameters. Such known good reference translations are often human-generated.