Many domains of knowledge may be represented as a collection of items or facts to be learned. Devices and techniques that are designed to allow learning of the items or facts to occur must present the items. Presentation alone, however, is not enough. Learning involves both knowledge and memory retention, as well as keeping the user's interest at an acceptable level. Boredom and the other moods that may block effective learning and/or inhibit the user from seeing through the course of study in the prescribed manner must be combatted.
There are many kinds of computer supported and other tools for learning. Most computer supported tools are based on typed interaction. Those that provide audio are of two types. The first, typified by self-study tapes, provides for audio output but does not accept audio input from the user. The second type, of which there are only a few examples, supports audio output and input.
With respect to self-study tapes, a particularly good example are those devised by Dr. Paul Pimsleur. As disclosed in a document entitled "Speak & Read Essential Spanish," by Paul Pimsleur, (Heinle & Heinle Enterprises, Inc. Publishers, 1988), the information on Pimsleurs' self-study tapes is organized on the basis of two main principles called "Anticipation" and "Graded Interval Recall". Anticipation is intended to avoid the dulling effect of mere repetition by recording information on the tapes in a form that requires the student to anticipate a correct response before it is announced by the tapes rather than to merely repeat something already announced. Graded Interval Recall is intended to avoid inefficient memorization by recording information on the tapes at graduated intervals that remind the student of information at the times when they would otherwise forget it. Items are thereby taught that form the bedrock for other items and repeated intensively at first, until knowledge of them resides in short term memory, and are thereafter repeated at ever greater intervals, until knowledge of them resides in long term memory.
The basic study cycle of self-study tapes, including Pimsleurs, consists of audio, silence, audio, silence. Typically, an instructor says something, either in the foreign language or in the user's native language, that calls for a response from the user, usually in the foreign language. There is a period of silence for the user to provide a spoken response, which may be in a form called by for the principle of anticipation. So that a non-native speaker can be given the correct pronunciation, a native speaker then provides a correct response, and thereafter there is a period of silence in which the user can assess the correctness of his own response, and repeat the correct response.
Self-study tapes in general have the advantage that they are inexpensive, typically costing only a few hundred dollars for a set, and can be used in a wide variety of situations. However, they are limited in their effectiveness by inflexibility and a lack of feedback. The fundamental problem is that self-study tapes have no way of assessing whether the user has made a correct response, or for that matter, any response at all. As a result, they cannot provide feedback about correctness to the user. They also have no choice but to follow a single fixed pattern of instruction.
The lack of feedback is troublesome in many ways. First, it can be quite hard for users to decide whether or not they are correct. For instance, it can be hard for a user's untrained ear to hear subtle differences. In addition, there is often more than one correct response. Therefore, the mere fact that a user's response was different from the response on the tape does not necessarily mean that the user was wrong.
The lack of flexibility is an even greater difficulty. Self-study tapes must contain significant repetition to be effective. In fact, users typically have to listen to a set of tapes several times for good learning to occur. Unfortunately, since the repetition is fixed, it cannot be varied based on what the user is and is not succeeding in learning. Inevitably, much of the repetition involves things that were easy for the user to learn and is therefore a waste of time. On the other side of the coin, there are inevitably things that the user finds hard to learn that fail to get repeated often enough.
With regard to devices that support audio input as well as output, there are several examples. One voice-recognition based instructional device, the Summit Literacy Tutor, described in a document entitled "Word Rejection For a Literacy Tutor", SB Thesis, (MIT, 1992), by McCandless, is designed for teaching people to read English and includes a speech synthesizer and a speech recognition system called Summit described in a document entitled "The Summit Speech Recognition System; Phonological Modeling and Lexical Access" , Proc. ICASSP, (pp. 49-52, Alburquerque N.Mex., April 1990), by Zue, et al. The Summit literacy Tutor uses Summit to monitor a person who is reading aloud, detects pronunciation errors when they occur, and using the speech synthesizer, pronounces words and sentences when requested by the user. The Summit Literacy Tutor merely monitors what the user chooses to read but does it without regard for what may be read next or for whether the user knows or retains in memory what is read.
Another voice-recognition based instructional device, the Summit Language Tutor, described in a document entitled "Language Tutor: An Interactive Aid for Teaching English and Japanese", in Annual Research Summary, Spoken Language Systems Group, MIT Laboratory for Computer Science (Nov 1993), V. Zue (editor), is designed for teaching people to properly pronounce previously written sentences. The Summit Language Tutor includes a speech synthesizer and the Summit voice recognizer. The Summit Language Tutor uses the speech synthesizer to read the sentences aloud, and uses Summit to monitor the user's responses, should the user choose to read the sentences. Like the Summit Literacy Tutor, the Summit Language Tutor merely monitors the pronunciation of the previously written sentences should the user choose to pronounce them but without regard for whether the user knows or has retained in memory the previously written sentences. In one mode, the Summit Language Tutor can select at random a sentence as a test, but beyond that, it has no regard for what is to be read next, which is left to the freedom of the user.
Another voice-recognition based instructional device, the SRI Autograder system, described in a document entitled "Speech Recognition Technology For Language Education", Speech Research & Technology Program note, (SRI International, November 1993), by Bernstein, is designed to allow people to determine how well they have spoken a foreign language utterance. The Autograder system includes a voice recognition unit called Decipher described in a document entitled "The Decipher Speech Recognition System" Proc IEEE ICASSP, (pp. 77-80 1990), by Cohen et al. The Autograder system uses Decipher to assign a score that represents how accurately and how fluently a user has uttered a phrase but it makes no attempt whatsoever to teach any items of knowledge.
Another voice-recognition based instructional device, the SRI VILI system, also described in the document by Bernstein, supra, is designed to enable people to have a dialog in a foreign language and includes a database of stored utterances, a dialog grammar and the Decipher voice recognition unit. The SRI VILI system uses the dialog grammar and the Decipher system to recognize which response the user has selected of several possible responses allowed at a typical moment by the dialog grammar and varies the course of the dialog to the selection made. The SRI VILI system only selects that which is to be spoken next in response to which selection was made. Beyond that, it has no regard for whether the user knows or retains in memory what has been said.
The above voice-recognition based instructional devices are deficient in that they are costly laboratory prototypes ill-suited for commercial use. In addition, they are unable to select what is to be next presented with sufficient intelligence to prevent unnecessary and time-consuming repetition, user boredom and the other moods that prevent effective learning, and do not vary the course of study in dependance on what the user did or did not know. Moreover, they fail to provide for human and voice recognition machine errors.