1. Field of the Invention
This invention relates to speech training systems and, more specifically, to a speech training system which permits a speech-impaired student to type in any word or sentence he or she wants to learn and observe, on a CRT display, the articulatory model movements required to produce that word or sentence in the form of tongue-palate contact patterns. The system is particularly suited to deaf children, who do not receive auditory information and who tend to learn to type at an early age. The invention can also be used to help normal-hearing students learn how to speak foreign languages.
2. Description of Related Art
The most fundamental method for teaching deaf children to speak consists of teachers using their own vocal apparatus for showing the correct vocal gestures. Children can observe the external appearance of the lips, jaw and, to a limited extent, the tongue, as speech is produced by a teacher. Children are sometimes instructed to use tactile feedback from the teacher's vocal organs for comparison to their own. This method has obvious limitations in that many of the articulatory gestures in speech are not observable externally.
In recent years, it has been possible for teachers to demonstrate how speech is produced with the aid of instruments and of computer programs which analyze speech. These instruments and programs permit the observation of many characteristics of speech, including its acoustic manifestation, as the speech is being produced. This system is best demonstrated in the Computer Integrated Speech Training Aid (CISTA) developed by Matsushita. The CISTA provides multiple-channel data gathered by several transducers.
1. The dynamic palatograph. This instrument, whose use was first reported by 1962 by a Soviet researcher, Y. Kuzmin, indicates contact between the tongue and palate by means of a number of electrodes on an artificial palate that is worn in the mouth. When the tongue touches one of the electrodes, a low-voltage circuit is completed which is registered by instruments outside the mouth. The indication as to the presence or absence of contact is provided on a CRT display.
2. Nasal sensor. An electret microphone held to the side of one nostril by headgear or temporarily attached with adhesive tape provides an indication of nasal vibration.
3. Throat sensor. An electret microphone, held at the larynx by means of a flexible collar, provides an indication of glottal vibration.
4. Airflow sensor. Several methods have been used for sensing airflow, using a device held by the child in front of the mouth.
5. A standard microphone provides input for acoustic analysis.
The teaching of speech to the deaf is made difficult by the limited time the children have with speech teachers. While a hearing child receives speech input and acoustic feedback as to his own production for many hours a day, a deaf child typically only receives such feedback during training sessions which may be as infrequent as one session a week.
Combining such speech training devices as the instruments of the CISTA with computers allows children to receive part of their training without a teacher's intervention, greatly expanding the time that training is available. However, speech training devices such as CISTA which provide feedback directly to children are limited to teaching individual sounds or a limited set of preprogrammed utterances. The purpose of the present invention is to permit a child to receive information as to the production of any utterance without the assistance of a teacher.
Other prior art text-to-speech systems permit any utterance which is typed to be automatically synthesized. A device called "DECTalk," produced by Digital Equipment Corporation, is currently the best-known example of text-to-speech for English. All these text-to-speech systems are limited to producing audible sounds only.
The shape of the human vocal tract determines the resonances that in turn control human speech output. Electronic and computational models of the relationship between the vocal tract shape and the resulting acoustic output have been an important part of speech research for many years. In this work, the shape of the vocal tract was provided by the researchers, and the acoustic output was measured.
More recently, researchers have been developing articulatory synthesis, in which the generation of the vocal tract shapes was produced automatically. In this case, the input consists of a string of phonemes. This string is converted to a string of vocal tract shape specifications. The vocal tract shapes are then used in a vocal tract model which produces speech.
The automatic generation of vocal tract shapes from phonemes in this earlier work did not include the generation of tongue-palate contact patterns. Use of synthesized tongue-palate contact patterns to teach speaking has not been contemplated heretofore.