Automated sentence parsing has many purposes, from translation from one language to another, to voice recognition. “Parsing” involves dividing a sentence into its constituent phrases: noun phrases, verb phrases, and prepositional phrases. One definition of a phrase is a group of one or more words that form a constituent and so function as a single unit in the syntax of a sentence. A phrase will always include the title part of speech, and often other words as well. Any phrase can in general include other phrases, i.e. nested phrases.
Phrases may be combined into clauses. One or more clauses may be combined into a sentence. A sentence can be defined in orthographic terms alone, i.e., as anything which is contained between a capital letter and a full stop (period). A clause may or may not include a noun, verb, and object which usually but not always a sentence will have.
A particular issue in such parsing is resolving ambiguities. Consider the sentence “The old lady hit the thief with her purse.” This sentence could mean “The old lady used her purse to hit the thief.” Or it could mean “The old lady hit the thief that had her purse.” These are very different meanings of course.
Automated parsing cannot at this time, easily resolve these ambiguities. Indeed, even a human cannot always do so, although by referencing adjacent sentences or even the entire text, a human may often be able to make an educated or even a very accurate guess.
If the previous sentence to the one above had been for example “An old lady was walking down the street carrying a baseball bat and her purse.” a human will accurately assume that the old lady hit the thief with the baseball bat. On the other hand, if the trailing sentence read “The thief fell to the ground and the old lady walked on, never having lost her purse” then it's safe to assume that she had hit the thief with her purse.
Systems described in the applicant's previous patents use algorithms that produce quite accurate parsing. These patents include:
U.S. Pat. No. 5,802,533 issued Sep. 1, 1998
U.S. Pat. No. 6,279,017 issued Aug. 21, 2001
U.S. Pat. No. 7,036,075 issued Apr. 25, 2006
U.S. Pat. No. 7,765,471 is sued Jul. 27, 2010
U.S. Pat. No. 7,861,163 issued Dec. 28, 2010
These algorithms produce hierarchical lists that parse each sentence in the text by the constituent clauses. In cases where it is important to parse the sentences very accurately, and it is possible to delay the final listing to allow human input to correct any possible errors, these lists can be corrected by direct human editing. If the parsing is part of a system for real time translation for example, then human input is not possible.
The hierarchical lists may be quite complex. In common with all text-based presentations of information, such lists are easy to misunderstand, and difficult to accurately review for error. A human reviewing such lists will typically lose focus after a time, and either fail to accurately correct a list having an error, or even miss the error completely. Accordingly, a system that displays the sentence structures in a way that is more graphical and less textual may well allow more accurate correction of such lists.