The present invention relates generally to language parsers and more specifically to a method and apparatus for natural language parsing of sentences in real time that is universally applicable to other languages and is capable of analyzing the full range of grammar and syntax, manipulating sentences to create other structures, and engaging in question/answer and statement/response repartee in real time.
Since Chomsky (Syntactic Structures. The Hague/Paris: Mouton, 1957), linguists have been grappling with the problem of providing a generative theory of syntax to describe the structure of human language. This endeavor has resulted in a proliferation of syntactic theories, each with a slightly different set of assumptions about the correct characterization of syntax. Such known methods are described in publications such as Government and Binding, Chomsky (1981, 1986), Lexical Functional Grammar, Bresnan (1982) and Kaplan and Bresnan (1982), Categorial Grammar, Oehrle, Bach, and Wheeler (1988), Lexicase of Starosta (1988), Generalized Phrase Structure Grammar, Gazdar, Klein, Pullum, and Sag (1985), and Head-driven Phrase Structure Grammar, Pollard (1984, 1985). (Generalized phrase structure grammars, head grammars, and natural language. Stanford University dissertation. "Phrase structure grammar without metarules." In J. Goldberg, S. Mackaye, and M. Wescoat eds., Proceedings of WCCFL 4. Stanford linguistics association, 246-261.)
However, no theory or method has emerged as a widely accepted formulation that strictly conforms to the requirements of an explicit, generative grammar, nor has any theory or method gained wide acceptance as a truly representative model of the human faculty of language. Furthermore, none of these theories or methods have resulted in framework that can be fully implemented in a computer programming language to produce a viable natural language parser.
In the 1960's Noam Chomsky (Aspects of the Theory of Syntax) argued that it was possible to arrive at a scientific description of human language that was based on empirical investigations and which was based on the same principles as the sciences of chemistry, botany and so on. To do this he proposed a theory of syntax based on the structure of English. He further argued that once one had determined the basic nature of the syntax of one language, it would be possible to apply that theory to the languages of the world and arrive at equally descriptive and scientifically falsifiable descriptions as with those of the original language. Of course, there would be language specific variations that would account for the differences we all perceive in the different languages of the world, but the underlying theory would be one of a "universal grammar" from which all other grammars were derived.
It was subsequently argued that the theories of syntax that were proposed by Chomsky and others could be implemented in a computer program to create a "parser," a grammar analysis device, thereby giving computers many (though not all) of the language abilities possessed by people. It is important to point out the meanings of the words "grammar" and "syntax" as used hereinafter. In some cases the words "grammar" and "syntax" overlap in that they both refer to the structural relationship of words and parts of words that result in phrases, clauses, and sentences. However, in general, "linguistics" refers to formal scientific descriptions of these structural relationships while "grammar" refers to the more ordinary English text book style description of these relationships.
Following Chomsky's lead, since the 1960's and continuing to the present, linguists and computer scientists at major universities and institutions around the world have been struggling to arrive at a theory of syntax that is both capable of fully describing the structures of a language and capable of being implemented in a programming language. To date, the success on both counts has been meager at best. There is currently no single parser or method and no theory of syntax that can adequately accomplish the task. In fact, in judging the known methods based on the actual parsers that have been developed, one can conclude that they are still in their infancy. Known parsers and methods can only handle the simplest sentences and are only applicable to a very small subset of the syntactic and grammatical functions that comprise even one language of the world. In spite of the millions of dollars and hundreds of thousands of man hours that have been spent to solve this problem, computers implementing such known methods can only respond to basic commands. Such grammar checkers and translation devices have only the most rudimentary grammatical abilities. The ability to perform question/answer and statement/response repartee with computers and computer applications is unattainable using known methods. The current inventive method and apparatus, interchangeably referred to as "Attach Alpha" (also referred to as "Attach .alpha.") or the theory of Attach Alpha, is a solution to the above-described problem.
Some of the problems that underlie the lack of success of known methods are due to the fact that no theory of syntax has yet been able to claim wide acceptance as a thorough and complete theory. Known theories and methods are either to complex or too poorly or too vaguely formulated to be implemented in a programming language. Finally, a significant obstacle to the implementation of a theory of grammar in a computer program is the fact that such known theories typically generate hundreds, thousands, and even hundreds of thousands of possible parses of one sentence, leaving the computer and the user unable to choose a correct analysis of a sentence.
Known parsers are severely limited since they cannot sufficiently limit the number of possible parses that they produce. If the number of possible parses are too great, processing time increases to a point where real-time output is impossible. In many known parsers, ambiguity in the input string results in an exponential increase in the number of possible output sentences. For example, the sentence "john does like to put the book in the garage", although quite simple in comparison with common English sentences, can produce over 3,000 parses or combinations of sentences. This can be computed by finding the number of possible interpretations of each of the items in the string and counting the number of sentence combinations. In the above-described sentence, "john", "does", and "to" each have two possible meanings, and "like" has four possible meanings, and so on. The combination of possible parsed sentences quickly increases rendering known parsers essentially inoperative.
Accordingly, it is an object of the present invention to substantially overcome the above-described problems.
It is another object of thc present invention to provide a novel method and apparatus for parsing sentences that is universally applicable to all languages of the world.
It is a further object of the present invention to provide a novel method and apparatus for parsing sentences that is implemented on a known digital computer and operates in real time to analyze and parse complex sentences.
It is also an object of the present invention to provide a novel method and apparatus for parsing sentences that significantly reduces the number of parse combinations analyzed, rendering such analysis feasible on a computer.
It is still an object of the present invention to provide a novel method for parsing sentences that strictly limits all structure to that which can be stated in terms of the attachment of two items proceeding from the smallest items upwards toward the completed sentence.