As communications and high speed transportation continue to make our world seem smaller, knowing a second language becomes more important and valuable. Unfortunately, traditional language instruction in the classroom by itself generally does not, due to time constraints, sufficiently immerse the student in the second language he or she is studying to ensure rapid learning.
While written materials (e.g., textbooks, workbooks, and the like) provide some opportunity for the student to study by himself, written materials cannot effectively assist the student in pronunciation and other aural aspects of language learning. Although some written language study materials are accompanied by prerecorded audio tapes or records allowing the student to listen to the language being spoken, even these prerecorded audio materials have the disadvantage that they cannot provide the student with feedback about his or her pronunciation. In the past, the only way to obtain effective spoken language drills and practice outside of the classroom environment was to hire a language tutor (an expensive proposition) or to spend time with someone who was already fluent in the unfamiliar language.
The concept of using computer hardware/software to provide synthesized or digitized spoken language is generally known. The following is a somewhat representative (but by no means exhaustive) listing of prior publications, prior issued U.S. patents, and published software packages relating to computer-assisted language learning with speech capabilities:
U.S. Pat. No. 4,579,533 to Anderson et al; PA1 U.S. Pat. No. 4,591,929 to Newsom; PA1 U.S. Pat. No. 4,749,353 to Breedlove; PA1 U.S. Pat. No. 4,695,962 to Goudie; PA1 U.S. Pat. No. 4,710,877 to Ahmed; PA1 U.S. Pat. No. 4,769,846 to Simmons; PA1 U.S. Pat. No. 4,403,965 to Hawkins; PA1 U.S. Pat. No. 4,421,487 to Laughon et al; PA1 U.S. Pat. No. 4,457,719 to Dittakavi et al; and PA1 U.S. Pat. No. 4,549,867 to Dittakavi.
Brower, "Word Torture Eases Pain Of Language Learning", 2 MacWEEK n.48, p. 14 (29 Nov. 1988); PA0 Parham, "Computers That Talk", 8 Classroom Computer Learning n. 6, pp. 26-36, 63 (March 1988); PA0 Jack, "Worte & Satze: A German Tutor For Kids Or Adults", 2 Color Computer Magazine n.3, p. 20 (May 1984); PA0 Barbour, "Computerized Speech: Talking Its Way Into The Classroom", 6 Electronic Learning, n.4, p. 15 (Jan 1987); PA0 PEAL SOFTWARE (Los Angeles, Calif.), "Representational Play", "Keytalk", and "Exploratory Play" software packages; PA0 "E Z Pilot II Authoring System" software by Hartley Courseware, Inc., Dimondale, Mich.; PA0 "Smoothtalker Version 2.0" software by First Byte Inc.; PA0 "Experlogo-Talker/Prologo" software by Experintelligence, Inc.; PA0 "Voice Master Version 4.0" system by Covox Inc. PA0 "Basic Language Series--Spatial Concepts" by Science Research Association; PA0 "Talking Text Writer" and "Talking Text Speller" software published by Scholastic Inc., Jefferson City, Mo.; PA0 "Reading Skills Development Program" software available from American Educational Computer, Inc., Oklahoma City, Okla.; PA0 "Writing To Read" by International Business Machines; PA0 "Language Experience" software series from Teacher Support Software, Gainesville, Fla.; and PA0 Houghton Mifflin's "Listen and Learn" series, Houghton Mifflin Educational Software Division, Hanover, N.H.
Additional patents generally relating to learning aids with speech synthesizers include:
The Anderson et al '533 patent cited above discloses a microprocessor based electronic teaching aid which enables the student viewing a display to designate any word or portion of text for vocalization by synthesized speech techniques. The "reading" material provided by the system is stored in a preprogrammed (fixed) source. Read only memory. Pointers are used to point to the start addresses for the words. Mass storage devices are avoided in favor of semiconductor ROM memory. Speech data is stored in the memory as individual words in a dictionary. No facility for inputting digitized student utterances into the system is provided.
U.S. Pat. No. 4,591,929 to Newsom teaches a second language learning system connected to a magnetic tape recorder. An electronic interface controls the tape recorder functions. The last phrase played back by the tape recorder is converted into digital form and stored in an electronic store to permit the student to reproduce the phrase as many times as desired without having to rewind the tape. The student can also record his own voicing of a phrase in a different portion of the electronic store and can then selectively reproduce the teaching phrase or his response--re-recording his voicing until satisfied.
U.S. Pat. No. 4,710,877 to Ahmed discloses a computer-based language learning system including a speech synthesis capability using linear predictive coding. A menu driven student interface is used to step a student through preprogrammed lessons featuring visual and synthesized speech stimulae.
U.S. Pat. No. 4,695,962 to Goudie teaches a system which attempts to increase the naturalness of synthesized speech produced from linear predictive encoded speech data by substituting different data depending upon whether words are reproduced in isolation in a word mode or together with other words in a phrase mode.
The Breedlove '353 patent discloses a hand-held microprocessor based system that converts student utterances into digital form and allows the student to store the digitized utterances in memory associated with student inputted text such as correct word spelling.
The "Word Torture" software program referenced above is another example of a computer-assisted language learning system. This program, published by Hyperglot Software Co. of Knoxville, Tenn., is designed to run on an Apple MacIntosh personal computer equipped with a "Hypercard" programmable database which supports digitized and synthesized sound. Foreign language study stacks provide automated vocabulary drills that work from English to a foreign language or vice versa, and permit users to adjust interval times and add new words. The system also provides digitized pronunciations of foreign language alphabets.
Other systems (including the Scholastic Software "Talking Text Writer" program) are essentially talking word processors with speech synthesis capabilities to allow students to hear whatever is typed and well as hear text entered by the teacher.
However, as observed by Parkham in his survey article "Computers That Talk" discussed above, language arts system developers have in the past had great difficulties providing acceptable, useful systems. Known text-to-speech synthesis algorithms are capable of converting written text into synthesized spoken words by referencing prestored "phonemes" (sets of sounds). The "Smoothtalker", "Experlogo-Talker" and "Talking Text Writer" systems referenced above are examples of systems which use text-to-speech synthesis. While text-to-speech synthesis may be acceptable for talking word processors, user interfaces, or the like, known algorithms cannot produce the range of inflections (stress and intonations) and pronunciations required for language learning.
The digitized speech approach (i.e., in which actual human speech is converted to digital signals using digitizing hardware for later reproduction) is capable of producing speech as realistic as recorded voice--in any language and including accent and inflection. However, the use of digitized speech is extremely memory intensive (a limitation which has proven to be a major roadblock in its use in the past). A single second of digitized speech can occupy 64 Kbytes of storage space (somewhat less if compression algorithms are used). To reduce the amount of memory required, some system developers have used methods for reusing words by encoding and storing individual words and phrases individually. This has, however, been a problematic approach for language learning in the past--since it has been shown that students learn best when presented with words in natural context (and the same word or phrase is often pronounced differently depending upon context--see the Goudie '962 patent referenced above).
Most prior digitized speech systems have been limited to playing back prestored digitized speech. However, some prior systems also permitted the student to digitize his own speech for later play back. For example, Covox, Inc. claims its "Voice Master" speech synthesis system supposedly speaks in the user's own voice, in any language, and with any accent. To record speech, a "learn" command is inputted and the student speaks into a microphone. To play back the recorded speech, the student inputs the "speak" command. Up to 64 different words, phrases or other sounds can be in memory at any one time--with additional words being stored on disk and loaded as needed.
See also U.S. Pat. No. 4,591,929 to Newsom discussed above, which teaches: (a) digitizing a spoken phrase spoken by the user and storing the digitized user's phrase in an electronic store along with a digitized teaching phrase (played back from a tape recorder); (b) and permitting the user to selectively reproduce the teaching phrase or his own response. However, Newsom provides only minimal digitized speech storage (e.g., a single teaching phrase) and requires the student to control the functions of a tape recorder in order to select a different teaching phrase. The process of rewinding/fast forwarding a tape recorder is extremely cumbersome. Moreover, Newsom provides no facility for integrating textual material, graphical or other display, or other study aids with his strictly oral lesson.
Hence, although much prior work has been done in the area of computer-assisted language learning, there is room for much further improvement.
For example, no one in the past has successfully developed a truly interactive computer assisted language learning system which integrates visual displays with preprogrammed digitized speech and which also interactively digitizes student speech and permits the student to easily listen to his own pronunciation and compare it with the digitized pronunciation of a model word or phrase he selects. Significantly, the present invention may provide the very first truly interactive computer assisted language learning system which allows a student to select a model phrase from text displayed on an electronic display; record (in digitized form) his own pronunciation of that phrase; and instantly listen to the digitized vocal version of the selected phrase and his own recorded pronunciation for comparison purposes.
Many other significant advantageous features are provided by the present invention, including the following:
SoundSort--A text reconstruction exercise based on aural clues. In accordance with this feature of the invention, the system automatically randomizes the order of plural phrases, provides digitized utterances of the phrases in the randomized order, and requires the student to reconstruct the original order using a visual display interface.
An audio CLIP mode which permits the student to select any (random) portion of displayed text (e.g., a phrase, a small part of a phrase, a single word, a syllable, or a phoneme) using cursor controls and to control the system to play the digitized speech corresponding to that selected portion. This feature allows the student to concentrate on difficult phrases.
Integration of digitized sound in a high-level authoring system (as distinct from an authoring language) is provided. An easy-to-use "WYSIWYG" ("What you see is what you get") user interface reduces or eliminates mistakes and associated frustration and does not require the user to have any programming ability.
An extremely flexible authoring system allows a teacher to link recorded digitized speech with customized on-screen text (which may but need not match the digitized speech). This allows a wide variety of free-form exercises to be created.
The system permits the student to hear his own speech and the correct (model) speech, each at a keystroke, with no delay.
Teacher-composed customized help screens and instructions can be referred to by the student upon depressing a single keystroke. This feature permits great increases in the number of possible teacher-created lesson formats and also provides great flexibility in customization and ease of use not provided in other systems.
Despite the fact that digitized speech is employed, interrupt driven hardware in conjunction with software operating in the background permits essentially continuous replay of digitized audio data stored on a mass storage device--without pauses due to loading and reloading of memory (for up to 23 hours of continuous speech from a CD ROM mass storage device for example).
The presently preferred exemplary embodiment of the invention provides a system including several functional modules which are implemented in hardware, software or both. A digital speech processor connected to a conventional personal computer is used to convert digitized speech data to audio signals and vice versa under control of a memory resident interrupt driven software module (this module handles all play and record requests for the speech processor). A public domain RAMdisk driver sets aside memory for use as a simulated (virtual) disk drive. In the preferred embodiment, all recorded speech is placed on the virtual disk first, then copied to other mass storage devices (e.g., floppy disk).
The personal computer processor executes program control steps in the preferred embodiment which provide a wide variety of useful functions. These functions may be divided into "teacher" functions (used to create and compose lessons and exercises); and "student" functions (performed by the student for learning purposes). The student functions generally operate on lessons and exercises previously created by the teacher using the teacher functions.
One of the teacher functions is a "Text Writer" word processor permitting the teacher to compose texts. A lesson authoring utility is then used to record segments of sound (phrases) which are linked to phrases in on-screen text(s) composed with the word processor. The teacher may also select a second (page two) textual display format to be presented as instructions or help to the student. After recording the phrases, the teacher selects which of three student functions will be used with the newly created lesson. The teacher may, therefore, create texts and exercises appropriate to any of the three functions.
Three student functions are provided in the preferred embodiment: (a) AudioLab (which provides aural and oral practice and learning); (b) SoundSort (an aural text reconstruction exercise); and (c) AudioWrite (a writing exercise focusing on listening comprehension).
The AudioLab student function in the preferred embodiment provides three modes: (i) PREVIEW, (ii) LAB, and (iii) CLIP.
In the PREVIEW mode, the student can listen to an entire prerecorded lesson with the option to view the corresponding complete text on the personal computer display screen. Thus, the student hears the digitized model speech of a lesson and can also view the displayed corresponding text (generally the text of the speech) as an audio-visual lesson.
In the LAB mode, the student can select individual phrases from the recording. The student may also view the complete text on the display--or only the text corresponding to a phrase selected by the student. The student can also record himself speaking any individual phrase of his choosing, and play back his own speech and the corresponding preprogrammed model digitized speech so as to compare the two.
In the CLIP mode, the student can work with any selected portion of the current phrase (down to 0.1 seconds long in the preferred embodiment). The student can play the entire original phrase or only a portion of the phrase he selects; record himself speaking; and compare his played back speech to the original. Moreover, the student can examine phrases in three different ways in the preferred embodiment: forwards (e.g., "This/is/an/el/e/phant"); backwards (e.g., "phant"--"e/phant"--"el/e/phant"); or middle (e.g., "is/an").
The SoundSort function provides a computer puzzle exercise which randomizes (jumbles) the order of phrases in a lesson text. A column of symbols is displayed representing the phrases in the lesson text. The student must restore the symbols into the correct order by moving the symbols around the display screen (using interactive cursor controls and the like). The only clues provided by the preferred embodiment as to the correct order of the phrases are aural versions of the phrases obtained by listening to selected phrases (as many times as the student desires) and by listening to the complete, original lesson. The text is not shown on the screen in the preferred embodiment--requiring the student to listen to the phrases and reorder them into the proper context.
The AudioWrite function of the preferred embodiment provides the digitized speech lesson one phrase at a time, and requires the student to type or reconstruct what he hears (with complete freedom of correction and repetition). The phrase typed in by the student is then compared to the original text, and any differences are flagged as errors. Punctuation, spacing and capitalization are provided by the system in the preferred embodiment and are thus not tested.
Thus, the highly integrated speech and visuals provided by the present invention permits a student to:
see, hear, record and compare complete text or dialogue, phrase by phrase (or by selected portions of phrase);
practice listening comprehension; and
instantly, randomly access any part of a recorded selection. The system also provides teachers with an easy-to-use utility for creating an infinite variety of exercises.