A machine translator can employ computational linguistics to automatically translate a phrase from one natural language to another. Although such translation can be done by substituting words in the one natural language for words in the other, the resulting translations are usually poor because they do not take into account differences in linguistic typology, translation of idioms and the special treatment of linguistic anomalies.
Machine translation can use a technique based on linguistic rules. Rule-based techniques parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach can include interlingual machine translation (e.g., text to be translated is first transformed into an interlingua, i.e., an abstract language-independent representation) or transfer-based machine translation (e.g., applying sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language.) These techniques can require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
A parser is a component of a machine translator that analyzes syntax and builds a data structure (often some kind of parse tree, abstract syntax tree or other hierarchical structure) implicit in the input tokens, such as elements of a source language to be translated into a target language. Many modern parsers are at least partly statistical and rely on a corpus of training data which has already been annotated (parsed by hand). This approach allows the parser to gather information about the frequency with which various constructions occur in specific contexts and to construct translation rules. The quality of translations generated by a parser can depend on the quality of parser data. Examples of parser data include phrases, training data, weighting factors, phrase tables, properties of the words, information about the syntactic structure of the phrase (such as dependencies), the grammar, etc., or a combination thereof. A “phrase” can include any number of words, numbers, characters, punctuation or other such entities or combination thereof. Within the parser, a phrase or phrases can be associated with structures and/or additional information (e.g., attributes, etc.) such as hierarchies, rules, parse trees, part-of-speech tags, counts, probabilities, semantic categories, etc., or combinations thereof.