The present invention relates to natural language understanding. In particular, the present invention relates to semantic and syntactic parsing of text strings.
In natural language understanding, computerized language systems attempt to identify a logical representation for a text string. In some systems, a semantic or meaning-based representation is formed by performing a semantic parse of the text. In other systems, a syntactic or grammar-based representation is formed by performing a syntactic parse of the text.
In many systems, the logical representation takes the form of a parse tree that has the words of the text as leaves and that has tokens at each of the nodes in the tree. Each token represents a logical abstraction for the words and tokens that are spanned by the token. A single token is at the root of the tree and spans the entire text.
In one type of parse, the parse tree is formed by selecting one word at a time from the text string. With each word, the parser first identifies those tokens that begin with the word. These tokens are then added as possible partial parses for the text string. In addition, the parser determines if any partial parses can be extended based on the word. At times, a word will complete a parse for a token. When this occurs, the completed token is used to identify other tokens that begin with the completed token. In addition, the partial parses are examined to determine if they can be extended by the completed token.
In the past, the process of identifying tokens that could use a completed token was computationally intensive, and therefore slowed the parsing of the text. In addition, to identify which partial parses could be extended by a word or completed token, all of the possible partial parses were examined under the prior art. Since there can be a large number of possible partial parses, this also slows the parsing of the text.
Lastly, for systems that have a large number of semantic or syntactic tokens, a large number of hypothesis tokens can be generated during the parse. The large number of hypotheses makes the parse more complex and takes up a substantial amount of memory in the parsing system. Because of this, an effective tool is needed for managing parsing hypotheses.