Examples of natural language forms include written and spoken words, gestures and face expressions which clarify the meaning of words, intonation patterns indicating whether a sentence is a question, and other auditory and visual contextual aids for conveying meaning. Each of these forms may convey multiple meanings which may appear ambiguous when forms occur out of context. For instance, there are many dictionary meanings of the word ‘dash’, such as a race or a small amount of seasoning. Since multiple meanings for words are so common, lexicographers refer to a single word's diversity of meanings as polysemy. To disambiguate polysemy, all the of the above natural language forms must be correlated in order to infer the context which identified specific meaning from a list of polysemous potential meanings.
The prior art detects such inputs on a piecemeal basis, but to provide a comprehensive user interface for computer systems, all of the above inputs must be correlated, as efficiently as a person can correlate them. However, the prior art has failed to correlate even a fraction of such inputs as efficiently as people do. As a result, designers of natural language processing systems have greatly restricted the range of conversation recognized by natural language processing, to increase accuracy at the expense of flexibility.
Thus prior art computer systems correlate small ranges of natural language inputs, but the vast majority of inputs that a human could easily correlate remain useless to such systems. For instance, a raised eyebrow can indicate a question and a smile may indicate agreement, and a wave of a hand might indicate dismissal or indicate of a change in subject. All of these gestures are useful and meaningful for people, but the prior art cannot reliably correlate simple gestures with other inputs such as spoken or written words.
The inability of computer systems to correlate a wide variety of inputs has hampered the ability of computers to participate in conversations with people. In particular, this inability to correlate prevents computers from recognizing important contextual shades of meaning which are needed for parsing natural language. In human conversation, contexts shift fluidly and dramatically to constrain the meaning of gestures and other symbols which can have varying meanings depending on precise context. For instance, the word ‘dash’ can mean at least four different nouns and three different verbs: dash as in a small quantity mixed into something (a dash of salt), dash as in a race(a fifty yard dash), dash as in car dashboard, dash as in a Morse Code (a signal longer that a dot), dash as in to mix together, dash as in to ruin something (dash hopes), dash as in to move quickly. For clarity, more meanings for dash may be needed and added later to a natural language processing system. For instance, dash as in to splatter, or dash as in a short sudden movement.
Prior art uses grammar or statistics to disambiguate polysemy. Prior art using a grammar analysis to choose the correct meaning of ‘dash’ based in grammar can only determine whether a noun or verb meaning is best. Such a system would not disambiguate whether dash means a Morse code or a small quantity of something. Prior art using statistics of usage can only choose a meaning which was chosen most of the time in a context. Thus statistical methods disable any other meanings and prevent the acquisition of new meanings for a symbol within any context.
Human linguistic abilities are clearly less limited, in addition to being more accurate than prior art. Besides acquiring new meanings for words within a conversational context, humans also create new meaningful contexts through self-organizing linguistic abilities. For instance, as humans converse about subjects, subjects acquire new semantic meanings which evolve into new conversational contexts. In contrast, prior art computer systems only acquire new semantic contexts through laborious programming and data-entry efforts. The encumbrance of such programming and data-entry prevents prior art computer systems acquiring semantic knowledge on a real-time basis during natural language conversations with humans. Although a large number of programming languages have been created for inputting semantic knowledge, none of which have the flexibility and general utility of a natural language such as English. Languages such as Prolog, SQL or Lisp cannot match the convenience of conversing in plain English or other natural languages.
General computer-implemented methods to process natural language have been based in either logic, statistics, or topology. Logic has been the dominant method in the prior art. However, logical ambiguities inherent in natural language have foiled the prior art attempts which rely upon logic as a basis for processing natural language. For instance, one of the most important aspects of human conversation is called polysemy: the re-use of identical symbols to mean different things, depending upon context. For instance, the word run could mean a verb meaning to step quickly, but it could also mean the verb to campaign for office or to mean the noun for a small brook. In any specific conversation, run would signify just one of these meanings, unless a pun was intended, in which case run might signify two such meanings. By sensing natural language contexts to determine which polysemous meanings are most within context, humans recognize which meanings are signified.
Context is impractical to define logically because logic requires enumeration of logical inputs and outputs. The number of contexts which can be defined by a natural language input is limited only by every possible shade of meaning of every symbol which occurs in every possible natural language sentence. Since such a large number of combinations cannot be enumerated, logical natural language processors store a subset to the full set of possible contexts as a logical approximation: Each natural language symbol is stored with its own set of logical data, and with rules for combining its logical data with other symbols.
For instance, when the context is politics, to run would mean to campaign for office. However, many of these rules will break when a combination of contexts is pertinent. For instance, if the context is a political appointee who runs for elected office and also runs a government agency, the meaning of run remains logically ambiguous.
The larger the semantic system, the more frequently contextually defined semantic rules are broken. For vocabularies larger than ten thousand words the frequency of flaws from broken rules easily overwhelms the accuracy of a natural language processing system, as demonstrated by problems in the CYC project. Even when attempting to define a small static semantic dictionary, logical contradictions emerge during testing which cannot be resolved without creating a new logical category for each possible combination of symbols. The combinatorial complexity of language makes testing these categories generally impractical.
For a semantic dictionary of N symbols, in a language where the maximum number of symbols strung together is M, the number of logical combinations is N to the M power. For a vocabulary of 1,000 words combined in short four word sentences, the number of logical combinations is 1,000 raised to the fourth power, which equals 1,000,000,000,000. If a team lexicographers attempts to define and test a semantic dictionary of this small size there would be 100,000,000,000 testing hours required if each test takes 1/10th of an hour. If 500 testers each work 2000 hours a year, the team can work 1,000,000 hours per year, and the testing will be complete in 100 years. By that time, the dictionary will surely be obsolete and require re-testing. For longer sentences and larger dictionaries, this drawback quickly grows exponentially worse.
Even worse, as phrases are used within new conversations, they immediately acquire new shades of meaning from these new conversations. A natural language processing system must track shifts in overall meaning of phrases to remain accurate. For instance, the meaning of a celebrity's name shifts as that name is used in major news reports, particularly if their fame is new. To logically represent such shifts in meaning, the rules describing how to combine a celebrity's name in various contexts must be extended to handle each new conversational use of the celebrity's name explicitly. Using logical methods, all possible combinations of phrases and contexts must be defined and tested.
Because the testing of logical methods is so impractical for large vocabularies, statistical methods have instead been dominant in natural language processing systems, particularly in speech recognition. In the prior art, statistical probability functions have been used to map from inputs to most likely contexts. Statistics, however, only apply to sets of previously occurring events.
All statistics require the collection of a set of prior events from which to calculate a statistical aggregate. For new events no such set exists and no statistical aggregate can be calculated. Unfortunately, natural language is full of new events, such as newly concatenated phrases, each having a unique contextual shade of meaning. For instance, a person might request “fascinating art—not ugly.” A person would have no trouble combining the definitions of fascinating, art, not and ugly to make some sense of such a request, even if that person had never before heard the phrase ‘fascinating art—not ugly.’ A statistical natural language processing system, on the other hand, would have no statistical event set from which to disambiguate the meaning of a new combination of words such as ‘fascinating art—not ugly.’
Another problem with statistics is that once an event set has been collected to describe the meanings for a symbol, statistical functions prefer frequently chosen meanings over rarely chosen meanings, rendering the system insensitive to new meanings conveyed by new events. Thus as statistical natural language systems acquire semantic knowledge, their ability to distinguish new information diminishes.
The above drawbacks can be avoided by topological methods for processing natural language.