The present invention relates to natural language processing. In particular, the present invention relates to dictionaries used in syntactic parsing of text.
A natural language parser is a program that takes a text segment, usually a sentence, of natural language (i.e., human language, such as English) and produces a data structure, usually referred to as a parse tree. This parse tree typically represents the syntactic relationships between the words in the input segment.
The parsing process relies on a dictionary that enumerates morphological, syntactic, and semantic properties of words in a given language. Using the dictionary, the parser is able to segment text into individual words, identify a standardized form for each word (the lemma), and identify likely parts of speech for each word. This information is then used when constructing the parse tree.
Traditionally, dictionaries have been created by hand by one or more linguists. However, creating dictionaries in this manner is time consuming and labor intensive. To reduce the amount of work needed to create a dictionary or add new entries to an existing dictionary, a number of learning techniques have been developed that automatically build certain portions of the dictionary. However, each of these heuristics updates the dictionary in a separate phase, typically including manual review, after a full training corpus has been analyzed. In other words, the dictionary is not updated dynamically. Because of this, the dictionary is not as complete as would be desired.