The present invention relates to a natural language processing system in general, and more particularly relates to a system for initially parsing a plurality of symbols and subsequently parsing a plurality of words, morphemes, or phrases to produce a syntactically or pragmatically correct output sentence.
A system and method for producing synthetic plural word messages for use by people unable to use their own voices is known in the speech synthesizing arts. The system was originally implemented as a linguistic coding system with an associated keyboard, in which the coding system was based on a symbol rather than a word, phoneme or letter. Such a system is disclosed in U.S. Pat. No. 4,661,916 to Baker et al issued Apr. 28, 1987.
In such a system, the keyboard is coupled to a computer which stores a plurality of plural word messages in a memory thereof for selective retrieval by the keyboard. The plural word messages retrieved by the keyboard are fed to a voice synthesizer which converts them through a loudspeaker to achieve audible messages. The keyboard utilizes polysemic (many-meaning) symbols or icons on the respective keys and by designating one or more of the keys and its associated polysemic symbols, selected previously recorded plural word messages from the computer memory may be retrieved in a simple transduction manner. The messages in memory are retrieved by actuating a particular sequence of a plurality of keys, to vary the context of the polysemic symbols. Thus, a plurality of sentences associated with each key in a particular sequence with a plurality of other keys may be selectively generated as a function of each polysemic symbol in combination with other symbols to access the plural word message or sentence.
Since such a communication aid is designed to be adaptable to either people of high intellect and education who are physically unable to speak or people with decreased cognitive abilities or little education, a system which is both easy to understand and operate, as well as quick and efficient, is necessary. Further, it is essential that both the cognitive and physical loads required by the user are reduced as much as possible. However, systems other than Baker '916 for synthetic speech, or typing devices, which have coding systems based on words, phonemes, or letters to be implemented by keyboards with indicia thereon relating to the words, phonemes, or letters are somewhat limited in efficiency of operation.
In utilizing a system based upon letters, for example, a limited number of keys could be utilized (i.e., 26 letters in the alphabet). However, such a system utilizing letters has several drawbacks. One drawback is that in a system for people physically unable to speak or who are cognitively impaired, spelling is difficult to master. People who can't articulate the sounds of a language have a limited ability to deal with letters. Finally, in utilizing letters one must type a large number of letters in a sequence to form a word, phrase and especially a sentence. Such a large number of keystrokes is especially cumbersome for someone with decreased cognitive or physical abilities.
In order to combat the problem of the need for a large number of letters in a sequence, single meaning picture or symbol approaches have been developed. In these systems, a symbol or picture can be utilized to represent a single basic concept or word. Because these systems are based upon single concepts or words and not letters, only a few symbols need be utilized, in sequence to represent a phrase or sentence. However, the major drawback of these systems is different from letter based systems. Although only a few symbols are necessary to form a sequence, many hundreds of symbols could be necessary to represent enough vocabulary to interact at home, at school or in the workplace. Thus, hundreds and even thousands of symbols could be necessary for a user to choose from. These large symbol sets are not only physically difficult (if not impossible) to represent on a keyboard, but also put a severe strain on the cognitive and physical abilities of a user both to choose a symbol from the large symbol set and further to key in the selected symbol.
Various techniques have been developed in an attempt to deal with the deficiencies of either the need for a large number of letters to form a sentence in a letter-based system; or the need for a large symbol set to represent all the notions or vocabulary necessary for daily interactions in a single-meaning picture/symbol system. One approach aimed at combating the long sequences of letters necessary in a letter system is to use alphabetic abbreviations. With such systems a user was unsure as to what each abbreviation stood for, for example, (wk) could stand for "walk" and (wo) could stand for "word", but what would stand for "work". The user could become confused and think either (wk) or (wo) stood for "work".
Another attempt to alleviate the large number of keystrokes needed in spelling was word/letter prediction systems. In such systems, a user would type a letter such as "B" and five words starting with "B" would appear on a display. Upon not finding the desired word displayed, an operator would then hit the next letter "0" of the desired word (if the desired word were "Bottle" for example). If the desired word is then displayed on the word list the number next to the desired word is noted and then hit. Such systems are highly visual requiring attention directed at two different fields, the keyboard and the screen. And these systems require operators to have strong spelling abilities (if he hit the wrong letter such as a "C" when the word "kitten" was desired, prediction would start with five words beginning with "C" and the user would be lost) and further, it was cognitively disorienting because it required the user to key a letter, read word lists on a display, key in another letter, select a number, etc.
Levels/locations systems were developed in an attempt to alleviate the large symbol set of single meaning picture/symbol systems. In such systems, a plurality of keyboard overlays were utilized. Each overlay contained a plurality of single-meaning pictures or symbols for a particular activity For example, there was a "party" overlay, a "going to the zoo" overlay, an A.M. activities overlay, etc. However, although only a limited number of symbols were on a keyboard at one time, the system severely limited a user's vocabulary at all times. For example, if there were 7 overlays, when a user had one on the keyboard, 85% of his vocabulary was unavailable to him, it being on the other six overlays. Thus, the abilities of a user were severely limited.
The linguistic coding system of Baker et al, U.S. Pat. No. 4,661,916, thereby solved a great number of these problems by employing a technique called semantic compaction. Semantic compaction utilized a keyboard with polysemic (many-meaning) symbols or icons on the respective keys. These polysemic symbols allowed for a small symbol set (each symbol having many different meanings depending upon other symbols in combination) and further allowed for utilization of only a small number of symbols in a sequence to transduce a previously stored word, phrase, or sentence. An example of the polysemic symbols of the Baker '916 patent are shown in FIG. 1. Thus by input of only a limited number of polysemic keys, a sentence or other plural word message can be selectively retrieved. The sentence can then be sent to a voice synthesizer to convert it, through a loudspeaker, to an audible spoken message. This device is a synthetic speech device which allows a user to go directly from thought to speech without the need to record words, phonemes and letter data of individual entities.
The Baker device retrieves and stores whole sentences and plural word messages for selective retrieval, and not just individual words, phonemes, or letters. By using these polysemic symbols, in combination, only a small number of key actuations are necessary to represent a sentence or plural word message. These iconic polysemic symbols or "icons" for short, as they are more commonly known, on the individual keys, were made so as to correspond to pictorial illustrations of real life objects, as can be seen by reference to FIG. 1. These icons were utilized because such symbols were more easily memorized and more versatile than alpha-numeric characters. Therefore, a user of low intellect, or one with a disability, would easily be able to access these icons representing real life objects to thus access plural word messages and thereby have these messages synthesized for output speech.
Large repertories of words, sentences and phrases were available and used by operators with a wide range of physical and cognitive disabilities. Many operators handled repertories in excess of 3000 vocabulary units.
However, although the system of Baker et al '916 is a revolutionary breakthrough in augmentative and alternative communication (AAC), there is still room for improvement over an iconic system utilized which produces synthetic word messages. The system of Baker '916, utilizes a simple transduction algorithm wherein a plurality of plural word messages or sentences are initially stored in a memory corresponding to a particular sequence of icons. Upon the user activating that particular sequence of icons, the corresponding plural word message or sentence are directly accessed from memory, sent to a voice synthesizer, and converted through a loudspeaker to achieve audible spoken messages. However, as previously mentioned, such a system merely utilizes simple transduction of a previously stored word message, that is words, phrases or sentences accessed via a particular sequence of polysemic symbols or icons.
The use of symbol parsing technology in Baker '916 was a revolutionary breakthrough but was limited to this simple transduction model. Although a finite-state transducer is easy to program into a small microchip, it is also inflexible and cannot anticipate the user's intentions the way a more intelligent parsing technology can. A system is desired in which users can have the greater freedom of icon selection, not having to worry about the precise ordering or completeness of an icon sequence, as well as a system having even fewer key actuations than a system such as Baker '916. Such a system, reducing the requirements of the user as well as reducing the number of necessary icons, can be achieved through intelligent parsing. A further reduction in the number of selections and the field of selections that such intelligent parsing can produce may be a substantial aid not only to those individuals who are cognitively intact and have intact language, but also to individuals who are cognitively impaired or have serious language deficiencies.
A natural language processing system is desired which further reduces both the cognitive and physical loads on the user. If the user is required to remember not only what concepts each icon represents, but also how to assign grammatical functions, morphological inflections, etc., then a smaller number of users will be successful. This is particularly true for potential system operators who have had strokes or experienced congenital, cognitive impairments. However, if the system can anticipate what is intended (for example, inferring that an action should be uttered as a particular inflection of a verb by taking into account subject-verb agreement) then the cognitive load of the user is reduced, since less information must be remembered and conveyed to complete an utterance.
In reducing the physical load, the number of keys can be reduced by eliminating entire classes of keys (like those that inflect verbs and verb tense) since fewer key actuations will be required. A reduction in key actuations greatly improves the quality of the user's interaction, especially for users with limited capabilities.
It is desired to combine parsing technology with interface technology to produce a successful device. These two technologies must be in balance so that the user's expectations are not violated. If the user is presented with an elegant interface that has little functionality behind it, he will become quickly disillusioned. Consequently, if the user is presented with a device that has an inadequate interface and excellent functionality, he will rapidly become frustrated. It is an unfortunate downfall of many computer systems in present technology that rely too much on fancy graphics to make up for a lack of real capability. The existing linguistic coding system of Baker et al '916, has an excellent balance between accessibility and functionality which should be preserved.
It is further preferred to design a system which uses particular semantic relationships among polysemic icons (multi-meaning symbols or pictographs) to assign a meaning to a sequence of key actuations made by a user. A sequence of icons may be associated with a particular language item, such as a morpheme, word, phrase or plurality of words, to be output when that particular icon sequence is actuated. Icons can be utilized to access language items and thus do what letters, single meaning pictures, words, and numbers cannot. Clearly, there are certain associations that can be made with both an icon and the word representing that icon. For example, it is easy to make the association with the word "food" when presented with either a picture of an apple or the word "APPLE". However, it is clear that there are certain kinds of association that can be made consistently and unambiguously with icons, although certain exceptions may hold.
For example, the greatest advantage that icons have over numbers, letters and words, is that, as pictographs, they each have distinct visual features that can be made easily transparent (translucent) to the user. For example, each icon has a shape, and a color, and picture some object which may have other visual properties as well. Although some symbols have shapes which are readily accessed (for example, 0, I, X, A), the abstract shapes of symbols are not unambiguous; the more abstract an association, the greater the chance the user will not prefer the intended interpretation. For example, "A" can be associated with a house or a mountain or a tall building, the tip of a pencil, etc. Since the shape of "A" is so abstract, many associations are possible. An icon of "house", however, is not subject to the same ambiguity.
Some systems have attempted to use letter coding to associate letters with concepts; however, this method of encoding is also prey to ambiguous interpretation. For example, a reasonable letter coding for the color "RED" would be the letter "R"; for "BLUE", the coding would be "B". However, what happens with the color "BROWN"?. The logical choice would also be "B", but a conflict arises with the code chosen for "BLUE". The same problem arises as in the previous example; since there are literally thousands of words which can be associated with a single letter, a letter-coded system rapidly runs out of coding space and is therefore limited in the number of concepts it can encode unambiguously.
Letter codes can be done in various ways. Two of the most common ways to encode plural word messages are called "salient letter encoding" and "semantic encoding". Salient letter encoding takes the initial letter of two or more fundamental words in the language string to be represented and uses them for the code. For example, "Turn the radio off" can be encoded as "RO" using this method. The problem arises that after many utterances, the same letters "RO" are needed to represent other language strings. For instance, "RO" are the most salient letters for "Turn the radio on". A strategy must then be employed to find other salient letters so that the ambiguity is avoided. Hence, "Turn the radio on" must be encoded using a different code such as "TO" or "TR". However, these letter combinations in turn can represent other common phrases such as "Take it off" or "Turn right". As the language corpus grows larger, the task of finding other unique combinations of salient letters becomes more and more difficult and by necessity must include codes that are less and less salient and more difficult to learn. After 500-1000 units are encoded, the codes become virtually arbitrary.
Semantic encoding takes letters to associate with concepts rather than individual words, so that "F" can be taken to represent food. The plural word message "I would like a hamburger" would then be encoded by "FH". The difficulty here is that "F" can represent many different concepts and must be used not only for "food" but for concepts such as "fast", "friends", etc. If each letter is assigned a single concept, a language corpus represented by the combinations of twenty-six root concepts would indeed be impoverished. If letters are allowed to represent one concept in initializing a sequence, and other concepts as second or third members of a sequence, this disambiguating what concept a letter means across a string of three letters becomes a difficult if not impossible task once the language corpus has grown to five hundred units or more.
A system is necessary which incorporates ideas from research and knowledge representation and natural language processing. The goal of knowledge representation research is representing an acquired knowledge from everyday domains (e.g., task planning, language understanding) that can be used by intelligent computer programs. The goal of natural language processing research is to investigate in particular the knowledge that is required for a computer program to understand and produce sentences in English or some other natural language.
It can be said that the intelligence of a system is determined by how much it knows. Before a computer can show an understanding of a linguistically complex utterance, it must have a significant body of knowledge about morphology, word meaning, syntax, etc. In order to support intelligent processing, much linguistic knowledge must be incorporated into the system.
The system may not only combine the symbol parsing of multi-meaning sequence icons along with intelligent word parsing, but further utilize a well-chosen geographic layout which can provide the system with syntactic and semantic information based on the locations of the icons which are selected. This therefore reduces knowledge and inferencing required by the intelligent word parser.