The present invention relates to natural language processing. In particular, the present invention relates to syntactic parsing of text.
A natural language parser is a program that takes a text segment, usually a sentence, of natural language (i.e., human language, such as English) and produces a data structure, usually referred to as a parse tree. This parse tree typically represents the syntactic relationships between the words in the input segment.
In general, natural language parsers build the parse trees by applying syntax rules to the input text segment. Parsers apply these rules in either a “top-down” or a “bottom-up” manner.
In a bottom-up parser, all of the possible parts of speech for the individual words of the input text are first identified to form a set of word tokens. The parser then attempts to combine the individual word tokens into larger syntactic structures, such as noun phrases and verb phrases, by applying syntax rules to the tokens. The resulting larger structures represent candidate nodes for the parse tree. The parser continues to try to build larger and larger structures by applying syntactic rules to previously identified candidate nodes. A full parse is achieved when a node spans the entire text segment.
During the construction of the nodes, attribute-value pairs that describe the structure represented by the node are created. For example, a first token attribute and a last token attribute are associated with each node to indicate the first token and the last token that the node spans. Additionally, attributes such as “head”, which indicate the primary element of a noun phrase or a verb phrase, and “psmods”, which indicates the modifiers found after the head can be included for a node. The number and type of attributes that are associated with a node is unlimited and is controlled by the rule used to form the node.
The computational complexity of forming the parse is a function of the number of candidate nodes that are formed. To limit the number of candidate nodes, some systems adopt a minimal attachment strategy that prevents certain candidate nodes from being formed if other candidate nodes have already been formed or are expected to be formed.
Although this minimal attachment strategy reduces the complexity of forming an initial parse structure, it can result in parse trees that are less than optimal. To address this, many parsing systems walk through the initial parse tree to determine if it can be changed to provide a better parse.
One technique for improving a parse is to move a node in the parse tree to a different location within the tree. In the past, such reattachment was performed by executing a set of rules and functions to change the attribute-value pairs of the nodes affected by the reattachment so that the attribute-value pairs reflected the new location for the moved attribute-value pairs, they are different from the rules and functions used to form the initial parse tree. As a result, when attribute-value pairs are added or altered by a parse rule or when the use of attribute-value pairs changes, the rules and functions used to reattach nodes must be modified separately. This can create errors in the parser system as a whole and increases the cost of improving the parser.
As such, a reattachment method is needed that does not require separate rules for reattachment.