A system and method for producing synthetic words, phrases, or sentences for use by people unable to use their own voices is known in the speech synthesizing arts. The system was originally implemented as a linguistic coding system with an associated keyboard, in which the coding system was based on using multimeaning icons to represent language rather than using indicia related to a specific word, phoneme, or letter. Such a system is disclosed in U.S. Pat. No. 4,661,916 to Baker et al issued Apr. 28, 1987, incorporated herein by reference.
In the Baker system, the keyboard is coupled to a computer which stores a plurality of words, phrases or sentences in the memory thereof for selective retrieval by the keyboard. The words, phrases or sentences retrieved from the keyboard are fed to a voice synthesizer which converts them through a loudspeaker to achieve audible spoken messages. The keyboard utilizes polysemic or polysemous (many-meaning) symbols on a plurality of respective keys and by designating one or more of the keys and its associated polysemous symbol, previously recorded words, phrases, or sentences from the computer memory may be retrieved in a simple transduction manner. These words, phrases or sentences are retrieved by actuating a particular sequence of a plurality of keys, to vary the context of the polysemous symbols. Thus, a plurality of words, phrases or sentences associated with a symbol sequence may be selectively generated as a function of each polysemous symbol in combination with other symbols to access the word, phrase, or sentence.
A communication aid designed to be adaptable either to people of high intellect and education who are physically unable to speak or to people with decreased cognitive abilities or little education, needs to be easy to understand and operate, as well as quick and efficient. It is essential that both the cognitive and physical loads required of the user be reduced as much as possible. It is essential as well that whatever language representation system is used it must be capable of representing a large enough vocabulary to be useful in spontaneous, interactive communication in a variety of settings. Systems other than Baker '916 for synthetic speech or text generation devices have been developed, which have coding systems based on words, phonemes, or letters to be implemented by keyboards with indicia thereon relating to specific words, phonemes, or letters and are somewhat limited in efficiency of operation.
An advantage in utilizing a system based upon letters is that a limited number of keys can be used (i.e., 26 letters in the alphabet). However, such a system utilizing letters has several drawbacks. One drawback is that in a system for people physically unable to speak or who are cognitively impaired, spelling is difficult to master. People who can't articulate the sounds of a language have a limited ability to deal with letters which represent those sounds. Also, when using letters one must type a large number of letters in a sequence to form a word, phrase and especially a sentence. Such a large number of keystrokes is especially cumbersome for someone with decreased cognitive or physical abilities.
In order to combat the problem of the need for a large number of letters in a sequence, single meaning picture or symbol approaches have been developed. In these systems, a symbol or picture can be utilized to represent a single basic concept or word. Because these systems are based upon single concepts or words and not letters, only a few symbols need be utilized in sequence to represent a phrase or sentence. However, the major drawback of these systems is different from letter based systems. Although only a single symbol or a few symbols can form a sequence to represent a meaningful utterance, many hundreds of symbols are needed in such a system to represent enough vocabulary to spontaneously and appropriately interact at home, at school or in the workplace. Thus, hundreds and sometimes even thousands of symbols are used by operators of these systems. These large symbol sets are not only physically difficult (if not impossible) to represent on a keyboard, but also put a severe strain on the cognitive and physical abilities of a user both to choose a symbol from the large symbol set and further to key in the selected symbol.
Various techniques have been developed in an attempt to deal with the deficiencies of either the need for a large number of letters to form a sentence in a letter-based system; or the need for a large symbol set to represent all the notions or vocabulary necessary for daily interactions in a single-meaning picture/symbol system. One approach aimed at combating the long sequences of letters necessary in a letter system is the use alphabetic abbreviations. With such systems a user is unsure as to what each abbreviation stands for, for example, (wk) could stand for "walk". However, it could also stand for "weak", "week", or "walk". The abbreviation (wo) could stand for "word", but what would stand for "work". System operators become confused and need to remember hundreds of special rules, exceptions and frankly arbitrary codes.
Another attempt to alleviate the large number of keystrokes needed in spelling is letter-based word prediction systems. In such a system, a user types a letter such as "B" and a plurality of words starting with "B" appears on a display. Upon not finding the desired word displayed, an operator then hits the next letter "0" of the desired word (if the desired word were "Bottle" for example). If the desired word is then displayed on the word list the number next to the desired word is noted and then hit. Such systems are highly visual requiring attention directed at two different fields, the keyboard and the word list. To use these to enhance communication rate requires systems operators to have strong spelling abilities (if an operator hits the wrong letter such as "C" when the word "kitten" is desired, prediction starts with a plurality of words beginning with "C" and the user is thus lost). Further, such systems can be cognitively disorienting because they require the operator to key a letter, read a word list on a display, key in another letter, select a number, etc.
Levels/locations systems were developed in an attempt to alleviate the problems caused by large symbol sets of single meaning picture/symbol systems. In such systems, a plurality of keyboard overlays is utilized. Each overlay contains a plurality of single-meaning pictures or single concept symbols for a particular activity. For example, there could be a "party" overlay, a "going to the zoo" overlay, an A.M. activities overlay, etc. However, because only a limited number of symbols is on a keyboard at one time, the system severely limits a user's vocabulary at all times. In the case where a user has 7 overlays and an even distribution of vocabulary is assumed for each overlay, 85% of the vocabulary is unavailable to the user. The rest of the vocabulary is on the other six overlays. Even if the disabled user is physically or electronically able to change overlays, the vast majority of his or her vocabulary is out of sight at all times. Thus, the interactive communicative abilities of a user are severely limited.
The linguistic coding system of Baker '916 solved a great number of these problems by employing a technique called semantic compaction. Semantic compaction utilizes a keyboard with polysemous (many-meaning) symbols or icons on the respective keys. These polysemous symbols allow for a small symbol set (each symbol having many different yet obvious meanings depending upon symbol context) and further allow the use of only a small number of symbols in a sequence to transduce a previously stored word, phrase, or sentence. An example of the polysemous symbols of the Baker '916 patent are shown in FIG. 1. Thus, by input of only a limited number of polysemous keys, a word, phrase or sentence can be selectively retrieved. The sentence can then be sent to a voice synthesizer to convert it, through a loudspeaker, to an audible spoken message. This device is a synthetic speech device which allows a user to go directly from thought to speech without the need to record words, phonemes and letter data of individual entities.
The Baker device stores words, phrases or sentences for selective retrieval, and not just individual words, phonemes, or letters directly represented on the keys of other systems. By using a small set of polysemous symbols, in combination, only a small number of key actuations is necessary to represent a word, phrase or sentence. These iconic, polysemous (many-meaning) symbols or "icons" for short, as they are more commonly known, on the individual keys, were made so as to correspond to pictorial illustrations of real life objects, as can be seen by reference to FIG. 1. These icons are utilized for storing large vocabularies because such symbols are more easily memorized for large vocabularies because they are more versatile than alpha-numeric characters. Large repertories of words, sentences and phrases are available and used by operators with a wide range of physical and cognitive disabilities. Many operators handle repertories in excess of 3000 vocabulary units.
A sequence of icons may be associated with a particular language item, such as a word, phrase or sentence, to be output when that particular icon sequence is actuated. A small total number of icons, in short sequences, can be used to access language items. They do what letters, single meaning pictures, single concept symbols, words and numbers cannot do.
Thus, a significant advantage which icons have over numbers, letters and words, is that, as illustrations, they each have distinct visual features which are transparent or can easily be made transparent (translucent) to the user. For example, each icon has a shape, and a color, and illustrates some object which may have other visual properties and practical associations as well. Although some symbols have shapes which are readily accessed (for example, 0, I, X, A), the abstract shapes of symbols are not unambiguous; the more abstract an association, the greater the chance the user will not prefer or remember the intended interpretation. For example, "A" can be associated with a house or a mountain or a tall building, the tip of a pencil, etc. Since the shape of "A" is so abstract, many associations are possible. An icon of "house", however, is not subject to the same ambiguity.
Some electronic systems have attempted to use letter coding to associate letters with words, phrases and concepts; however, this method of encoding is also prey to ambiguous interpretation. For example, a reasonable letter coding for the color "RED" could be the letter "R"; for "BLUE", the coding could be "B". However, what happens with the color "BROWN"? The logical choice would also be "B", but a conflict arises with the code chosen in "BLUE". The same problem arises as in the previous paragraph; since there are literally thousands of words which can be associated with a single letter, a single letter encoding technique rapidly runs out of coding space. A two letter encoding technique rapidly runs out of coding space as well because there are only 676 possible two letter codes. Further, a large number of these codes are difficult to associate with words, phrases or concepts such as xx, xy, xz, yx, yy, yz, zx, zy and zz, for example.
Letter codes can be done in various ways. Two of the most common ways to encode single and plural word messages are called "salient letter encoding" and "letter category encoding". Salient letter encoding takes the initial letter of two or more fundamental words in the language string to be represented and uses them for the code. Using this method, for example, "Turn the radio off" can be encoded as "RO" (RADIO OFF). The problem arises that after many utterances, the same letters "RO" are needed to represent other language strings. For instance, "RO" are the most salient letters for "Turn the radio on". A strategy must then be employed to find other salient letters so that the ambiguity is avoided. Hence, "Turn the radio on" must be encoded using a different code such as "TO" or "TR". However, these letter combinations in turn can represent other common phrases such as "Take it off" or "Turn right". As the language corpus grows larger, the task of finding other unique combinations of salient letters becomes more and more difficult and by necessity must include codes that are less and less salient and more difficult to learn. After 500-1000 units are encoded, the codes become virtually arbitrary.
Letter category encoding takes letters to associate with concepts rather than individual words, so that "F" can be taken to represent food. The plural word message "I would like a hamburger" would then be encoded by "FH". The difficulty here is that "F" can represent many different concepts and would be the most memorable selection used not only for "food" but for concepts such as "family", "friends", etc. If each letter is assigned a single concept, a language corpus represented by the combinations of twenty-six root concepts would indeed be impoverished. If letters are allowed to represent one concept in initializing a sequence, and other concepts as second or third members of a sequence, disambiguating which concept a letter means across a string of three letters becomes a difficult if not impossible task once the language corpus has grown to five hundred units or more.
Thus, the semantic compaction encoding technique in Baker '916 is a revolutionary break through in electronic augmentative and alternative communication over alphabetic encoding, levels-location systems and frequency recency letter based word prediction techniques. However, several limitations may occur in the Baker '916 type of input unit for augmentative communication and other types of speech synthesis systems. Further, several limitations also exist in the area of text generation. In the Baker '916 device, the alphabetic characters appear on the same keys as the polysemous icons, not only to enhance the general associational power of each key viewed as an associational environment, but also to use space more efficiently and to reduce the total number of keys for scanning input users. Thus, there is a problem when, for example, a user desires to enter words or phrases which have not been designated by polysemous symbols (for example, the name of a specific city), the user must hit a key designating a particular type of spell input mode and then input a plurality of alphabetic characters to spell the name of the particular city. Still further, upon spelling the name of the particular city, a user must then access a symbol or communication mode key to then place the unit back into the symbol mode. Specific names of cities and specific names of people and specific technical terms, for example, are prevalent in a scholastic environment and thus may be crucial in an area such as text generation.
As previously mentioned, word prediction systems are utilized in an attempt to alleviate the large number of keystrokes needed in spelling. However, due to the previously mentioned problems of requirements for strong spelling abilities as well as the cognitive disorientation of continuous keyboard to display viewing adaptations, the word prediction system pose several drawbacks in a speech synthesis environment which were overcome by the semantic compaction system of the Baker '916 patent. The problems involved with word prediction systems are equally present when such a system is utilized in a text generation environment.
In text generation, a plurality of additional words are required, which were probably not as essential in speech synthesis. The lexicon used in interactive communication is often smaller than that used in writing. For example, a student in school may be required to generate a paper on "Christopher Columbus". In a scholastic environment, historical names such as "Christopher Columbus" are common. However, in a word prediction system, even if "Christopher Columbus" is present within the system, a user would probably have to enter several letters before the name "Christopher Columbus" ever appeared on a word prediction display. This is because many other common words beginning with "C", "Ch" (cheap, choose, chase, check, chip, chemistry, etc.), and even "Chr" exists, which must be available to a user. The presence of frequently used daily vocabulary hinders the generation of academic vocabulary. Yet even in academic settings these common words need to be readily available to system operators. Thus, a slow, and otherwise cumbersome system exists in word prediction for text generation.
Although the Baker '916 system is the best available system in augmentative and alternative communication, there is still room for improvement with the input system of the type utilized in the '916 patent to Baker, which produced synthetic words, phrases or sentences. With the input icon keys, as well as characters associated with a plurality of the keys, selection, in the '916 Baker patent, between the character mode and the symbol mode is necessary for the input unit to allow for generation of specific cities or people, or other words not already encoded by icon sequences. A separate spell mode key and icon mode (comm. mode) key (for the symbol mode) has to be accessed by the user in the Baker '916 system in order to select and switch between the character and symbol modes, and switch back again. Such mode selection keys are illustrated in the background FIG. 2 herein, which was previously utilized and developed by Bruce Baker, one of the present applicants.
In this type of system, a plurality of icons or character keys are utilized to access words, phrases or sentences previously stored in memory. However, upon selecting the respective icons associated with an encoded word, phrase or sentence, a user has to switch to, or select a character mode by activating the spell mode key in order to allow character input. This is somewhat cumbersome to the user, especially in the area of text generation, and could significantly increase the access time necessary to generate a word, phrase, or sentence. Further, upon completing input in the character mode, a user then has to actuate the communication mode key to put the input unit back in, or select the symbol mode. Again, this is somewhat cumbersome and could effect the input speed of the input system.