Many computer systems support a function whereby a human user may exert control over the system through spoken language. These systems often perform speech recognition with reference to a language model that includes a rejection path for utterances that are beyond the scope of the application as designed. The speech recognition component of the application, therefore, either returns the best match within the language model designed for the application, or it rejects the speech signal. A good description of a variety of systems which incorporate such methods can be found in "Readings in Speech Recognition," edited by Alex Waibel and Kai-Fu Lee (1990).
Computer assisted language learning (CALL) systems for second language instruction have been improved by the introduction of speech recognition. Bernstein & Franco, ("Speech Recognition by Computer," Principles of Experimental Phonetics, Ch. 11, pp. 408-434, 1996) and the references therein show some examples. In most cases, the speech recognition component of the CALL system has been used as best match (with rejection) or as a scored performance for testing and skill refinement, either for nonnative speakers of the target language or for hearing-impaired speakers.
Prior laboratory demonstration systems have been designed to offer instruction in reading in the user's native language. Two systems have emulated selected aspects of the interaction of a reading instructor while the human user reads a displayed text aloud. One system based its spoken displays on the running average of poor pronunciations by the reader (see, e.g., WO 94/20952 by Rtischev, Bernstein, and Chen), and the other system developed models of common false starts, and based its spoken displays on the recognition of the occurrence of these linguistic elements. (See J. Mostow et al., "A Prototype Reading Coach that Listens," Proc. 12th Nat. Conf. Artificial Intelligence, AAAI-94, pp. 785-792, 1994).
Expert teachers and other human interlocutors are sensitive not only to the linguistic content of a person's speech, but to other apparent characteristics of the speaker and the speech signal. The prior art includes systems that respond differentially depending on the linguistic content of speech signals. Prior art systems have also extracted indexical information like speaker identity or speaker gender, and calculated pronunciation scores or speaking rates in reading. However, these extra-linguistic elements of human speech signals have not been used in combination with the linguistic content to estimate the speaking proficiency or other characteristics of a human user. Measurement of extra-linguistic aspects of a user's speech along with the linguistic content of the speech allows finer estimation of the human user's skill state and the user's psychological state. Finer estimation of skills or states facilitates more exact control of the operation of the computer system in a manner appropriate to the skill state of the human user and the current state of readiness of the user. Such control of computer-based graphic and audio displays is useful and desirable in order to facilitate fine-grained adaptation to cognitive, verbal and physical skill state of the human user.
In the U.S. Pat. No. 5,870,709 of U.S. application Ser. No. 08/753,580, it was shown how computer systems that interact with human users via spoken language may be improved by the combined use of linguistic and extra-linguistic information manifest in the speech of the human user. It is also known that an individual's psychological state impacts aspects of that individual's speech. For example, it has been determined that mean fundamental frequency and other extra-linguistic speech characteristics can be markers of a speaker's emotions. See, e.g., Stassen H H, Bomben G, Gunther E. Speech characteristics in depression. Psychopathology, 24:88-105, (1991).
Using such knowledge, and recognizing that other speech characteristics are considered to be important in the analysis of emotion from speech, others have proposed methods for using these speech characteristics in self-training biofeedback systems. See, e.g., U.S. Pat. No. 5,647,834. However, to date such system have relied on measures from open speaking to estimate a user's psychological state.