In order for humans and computers to communicate effectively using natural human languages (i.e. English, Japanese, etc.), computer systems must be developed that can “understand” natural human languages. A starting point toward developing computers that can truly understand natural human languages is the development of computer systems for analyzing human language text. One type of system for analyzing human language text is the natural language parser. Natural language parsers, or simply “parsers,” analyze human language text to determine its syntax, or grammatical structure.
Parsing a natural language sentence involves several steps. First, the sentence is broken down into tokens, which may be words or punctuation marks. Next, a dictionary is consulted to determine grammatical information about each word, such as its part of speech (i.e. verb, noun, etc.). Finally, grammar rules are applied to the words and tokens to join them into larger sentence fragments called constituents. The grammar rules are applied recursively to the constituents until the entire sentence may be formed by joining two constituents. For example, the sentence “The dog barked” is comprised of the constituents “The” (adjective phrase), “dog” (noun phrase), and “barked” (verb phrase). Therefore, this sentence may be parsed by first applying a grammar rule that joins an adjective phrase followed by a noun phrase. The result of the application of this rule provides a larger constituent, the noun phrase “The dog.” Another grammar rule may then be applied that joins a noun phrase and a verb phrase to create a verb phrase. The application of this grammar rule joins the noun phrase “The dog” with the verb phrase “barked,” and results in a grammatically correct parse of the entire sentence.
Parsers typically apply all of the available grammar rules to a sentence using a brute-force algorithm. Therefore, the parser itself is relatively simple to create. In contrast, the grammar rules that the parser applies can be very complicated and must be created by a linguist. When creating grammar rules, a linguist typically asks the question “Given a span of text in a sentence, are there grammar rules that can be applied to form that span into a constituent?” If the answer to this questions is “Yes,” the linguist may then ask “What rules were applied to form the span into a constituent, and what is the resulting constituent?” If the parse of a sentence fails, and the parser is unable to form a constituent that spans the entire sentence with the available rules, the linguist may ask the question “Where did application of the grammar rules fail, and why?” The linguist may also ask “Are there rules that could be successfully applied?” In order to answer these and other questions regarding grammar rules, linguists utilize software tools for analyzing and debugging natural language parses.
Previous tools for analyzing and debugging natural language parses have been very difficult to use because of their text-based nature. For instance, some previous tools for analyzing and debugging parses simply display all of the constituents formed during a parse in a text-based list. To determine how a span of text may be joined, a linguist must scan through the entire list of constituents to find all of the constituents that join the span of text. This searching process can be very time consuming and frustrating for the linguist. To apply a grammar rule to two constituents, the linguist must first scan the list for the two constituents they want to join. Then, the linguist must type in a command to apply a rule to the two constituents. The linguist must know beforehand the rule that they want to apply, and either memorize or reference all of the available rules. This process is also very counterintuitive for a linguist and can be extremely time consuming.
Therefore, in light of these problems, there is a need for a method and apparatus for analyzing and debugging natural language parses that permits a linguist to quickly and intuitively analyze and debug the application of grammar rules. There is also a need for a method and apparatus for analyzing and debugging natural language parses that provides quick and easy access to all of the available grammar rules.