The present invention relates to natural language processing. In particular, the present invention relates to processing text to identify the semantics of the text.
The goal of natural language processing is to decipher the meaning or semantics of a free-form linguistic input. Some systems attempt to identify the semantics of a natural language input by applying semantic rules directly to the individual words in the input. Since words can have multiple meanings, these rules are very complex and difficult to develop. For example, the word “room” can be a verb as in “Bill asked if he could room with Jake” or a noun as in “The table is in the living room”. To determine which sense of the word is being used, the rules have to determine what other words are present in the text. Because a large number of words can be used with each sense of a word, a large number of rules are needed. As a result, a direct application of semantic rules to words in the text is considered to be unworkable in many environments.
To reduce the number of rules that are needed, many systems perform a syntactic analysis to identify the parts of speech of the words in the text and the syntactic relationships between the words before identifying the semantics of the text. The parts of speech can then be used as conditions in the rules instead of using the words directly. Thus, instead of having to list every possible noun in a rule, the rule can be written to simply require a noun. This normalization greatly reduces the complexity of the semantic rules.
However, using a separate syntactic parse produces a two-stage evaluation process in which the content of the entire sentence must be fully considered in each stage. This makes the process slow. In addition, if the syntactic parse fails due to a poorly formed, fragmentary, or erroneous input, the semantic process will also fail.
In addition, the semantic rules have been difficult to write because they have been written as a series of logical comparisons and operations. This makes the addition of new semantic structures time consuming.