The present invention relates to the field of teaching aids, and is more particularly directed to a computer-assisted audio/visual teaching system used in the classroom to present tutorial or drill and practice lessons to an individual student.
With the advent of computer-aided instruction (CAI) in the 1960's, largely under the aegis of Professor Patrick Suppes at Stanford University and Dr. Donald Bitzer at the University of Illinois, our nation's schools inexorably crossed the threshold of what some call an ultimate breakthrough in education. Although CAI has not proven to be a "breakthrough" in any sense of the word, it is welcomed by most educators as simply another tool in the educational audio/visual armamentarium. Probably the most significant contribution of some 550,000 classroom computers now in use is their ability to tirelessly present educational material to an individual student in a way that allows the student to interact, one-to-one, with his "electronic teacher" and to progress at his own pace.
Historically, computer-aided instruction consists of drill, tutorial, and simulation lessons, although CAI is anathema to many educators and computer cognoscenti who feel the student really isn't learning anything unless he is seated in a computer lab busily writing his own programs in Pascal, BASIC, LOGO, LISP, APL, etc. Whatever use is made of the school computer, the audio in "audio/visual" is usually lacking, except for occasional calliope-sounding tunes, beeps, peeps, and spaceship sound effects which punctuate any given activity on a television monitor. If the classroom computer is to excell as an "electronic teacher"for the individual student, computer hardware and software manufacturers must devote as much attention to sound as has been devoted to graphics. When sound is used, most classroom computers utilize the speaker of a connected television monitor or one within the computer's housing for an audio output. Audio output jacks which accommodate headphones for private listening are conspicuously lacking, thereby turning any classroom or school computer lab into a grand cacophony not unlike that of a neighborhood video arcade.
Some manufacturers are now, however, realizing that more than arcade-like sound effects are needed to truly improve the computer's educational potential as an audio/visual teaching device. Instead of beeps, peeps, and laser gun sounds, human speech now accompanies some of the current educational computer rograms. The hardware/software "state-of-the-art" modality for this is speech synthesis, which may or may not sound at all human. Synthetic speech systems which utilize a "canned" speech vocabulary can farily well approximate human-sounding speech, but lack flexibility in that their vocabularies are limited to pre-encoded words provided on disk or read-only memory (ROM). These systems, which are generally classified as direct waveform coded or linear predictive coded (LPC), require about 48 kilobytes of memory to provide from 20 seconds to 5 minutes of speech. The more memory used, the better synthetic speech sounds. Conversely, a third type of speech synthesizer using formant synthesis can produce up to one hour of speech with 48K bytes of memory, but the resultant speech sounds for all the world like a gnome or leprechaun with a thick Swedish accent.
Formant synthesis truly represents synthetic speech, since its "vocabulary" consists of elemental speech sounds and variants (phonemes and allophones) rather than words. Using this phoneme-driven synthesizer, speech is produced by stringing together various phonemes and allophones. Since elemental speech sounds can be strung together to produce any word, formant synthesis provides the teacher with unlimited vacabulary--a distinct advantage over direct waveform and LPC synthesizers. The classroom teacher soon discovers, however, that synthetic speech produced by formant synthesis is robot-like and partially unintelligible to him or his students. One source of pride to the user of a speech synthesis system is how well others can understand the voice output of that system. A standard test is usually given the synthesizer which requires that five or six words by synthesized for each of 32 major phonemes. If all of the words can be recognized, the synthesizer can be considered to be quite accurate. Although some educators tolerate the vagaries of computer speech synthesis, serious pedagogic problems arise when it is used in the early grades where students are just learning correct word pronunciation.
Current "state-of-the-art" computer aided instruction can include some $1,500 to $2,500 for a computer, monitor and disk drive, plus another $300 to $3,000 for a speech synthesizer. But this is just the beginning for a truly "high-tech" CAI classroom: add another $800 to $1,000 for a videodisc player which can be controlled by a computer and another $400 to $800 for a videodisc/computer interface card, and classroom CAI can include photographic-quality still and motion pictures which are all incorporated into an interactive lesson program. In most cases, computer graphics, such as the cursor, can overlay the videodisc image. The above-mentioned speech synthesizer will not usually be needed, since voice narration is also contained on the videodisc. But, as with "canned" synthesizer vocabularies, the instructor will be unable to personally modify the audio or video portions of his lessons unless he wants to spend $2,000 to $3,000 to custom-record a master videodisc using a manufacturer's equipment, or purchase a videodisc recorder/player for yet another $20,000 to $30,000. Much the same expense ($20 to $200 per word) is involved, if the teacher wishes to custom-encode vocabulary for an LPC speech synthesizer. It clearly becomes not a matter of the availability of high technology for CAI but a matter of what the schools can afford to pay for that technology.
Probably because of the above-mentioned costs, few school CAI lessons incorporate human speech and are primarily video in nature, except for the previously mentioned arcade sound effects. To truly make the classroom computer a useful device, the audio output must be capable of clearly presenting, in an unlimited vocabulary, instructional material--preferably in the teacher's own voice. Additionally, since those class members not involved in CAI should not be distracted by the audio output of a classroom computer, the computer must be provided with headphones for private listening. Rather than utilize any of the several and complex schemes for producing synthetic speech, the CAI audio output can simply be that of an ordinary cassette tape which is played back by a standard $20 to $30 cassette player in synchronization with any given computer program. This method is far less expensive than computer speech synthesis and results in a high fidelity speech output. Music can also be incorporated with voice on the cassette tape, and the usual computer arcade-like sound effects can additionally be coupled to the tape output.
The use of cassette tapes, rather than speech synthesis, to provide the audio portion for CAI is well-known. One such application utilizes a two-channel cassette tape, with the audio on a first channel and the associated computer program on a second channel. A second such application uses a single channel cassette tape, with the audio interspersed between related portions of the computer program. A third such application uses a floppy disk in conjunction with a cassette tape, with the disk providing a computer program and the tape providing synchronized instructional narrative. In all three applications, the student listens to recorded instructional narrative while watching written information or graphics on a monitor screen. At predetermined points within a given lesson, the computer program turns off the tape player so that a student can answer a question concerning certain instructional material which has been presented. Following a student response, the computer program turns on the tape player for additional audio and textual instruction.
In all of the above cassette applications, the audio channel provides only positive reinforcement when a student response is correct, while all remediation resulting from an incorrect response is presented in written or graphics form on the monitor screen. Ideally, any computer-aided instructional lesson should provide branching, so that remedial instruction, as well as positive reinforcement, can be presented both audibly and visually when the student makes a response.