Automated sentence parsing has many purposes, from translation from one language to another, to voice recognition. “Parsing” involves dividing a sentence into its constituent phrases: noun phrases, verb phrases, and prepositional phrases. One definition of a phrase is a group of one or more words that form a constituent and so function as a single unit in the syntax of a sentence. A phrase will always include the title part of speech, and often other words as well. Any phrase can in general include other phrases, i.e., nested phrases.
Phrases may be combined into clauses. One or more clauses may be combined into a sentence. A sentence can be defined in orthographic terms alone, i.e., as anything which is contained between a capital letter and a period. A clause may or may not include a noun, a verb, and an object, elements usually, but not always, characteristic of a sentence.
A particular issue in such parsing is resolving ambiguities. Consider the sentence “The little old lady angrily hit the thief with her purse.” This sentence could mean “The little old lady used her purse to angrily hit the thief,” or it could mean “The little old lady angrily hit the thief that had her purse.” These of course are very different meanings.
Automated parsing cannot at this time, easily resolve these ambiguities. Indeed, even a human cannot always do so, although by referencing adjacent sentences or even the entire text, a human may often be able to make an educated, very accurate guess.
If the previous sentence to the one above had been, for example, “An old lady was walking down the street carrying a baseball bat and her purse,” a human will accurately assume that the old lady hit the thief with the baseball bat. On the other hand, if the trailing sentence read “The thief fell to the ground and the old lady walked on, never having lost her purse,” then it's safe to assume that she had hit the thief with her purse.
Systems described in Applicant's prior patents use algorithms that produce quite accurate parsing. These patents include U.S. Pat. Nos. 5,802,533; 6,279,017; 7,036,075; 7,765,471; and, 7,861,163, each of which is incorporated herein by reference in their entireties.
Applicant's disclosed algorithms produce hierarchical lists that parse each sentence in the text by their constituent clauses. In cases where it is important to parse the sentences very accurately, and it is possible to delay the final listing to allow human input to correct any possible errors, these lists can be corrected by direct human editing. If the parsing is part of a system for real time translation for example, then human input is not possible.
The hierarchical lists may be quite complex. In common with all text-based presentations of information, such lists are easy to misunderstand, and difficult to accurately review for error. A human reviewing such lists will typically lose focus after a time, and either fail to accurately correct a list having an error, or even miss the error completely. Accordingly, a system that displays the sentence structures in a way that is more graphical and less textual may well allow more accurate correction of such lists. Moreover, such system may be readily adapted so as to capture and convert human user based interactions with graphical elements into additional machine-readable text and mark-up that may be suitably and advantageously used for other machine-based test processes. Further still, it is contemplated that such system, or a readily adapted version thereof, in addition to being an especially effective parsed text/editor interface, enables a new reading and document building format, with numerous options for individual human user customizations, multimodal inputs/outputs and text editing, with mediation of the fusion if graphic an prosodic structures for the analysis and representation of syntax.