The present invention relates to educational systems generally and more particularly to computerized systems for teaching speech.
In recent years there have been developments in the art of computerized teaching of speech. Speech laboratories in which prompts and cues such as pre-recorded sounds and words are presented to a student and the students"" speech productions are recorded or monitored are well known.
The Speech Viewer II, marketed by IBM, is a speech therapy product which provides visual and auditory feedback from a student""s sound productions.
Known methods and apparatus for computerized speech recognition are described in the following publications, the disclosures of which are incorporated herein by reference:
Flanagan, J. L. xe2x80x9cComputers that talk and listen: Man machine communication by voicexe2x80x9d, Proc IEEE, Vol. 64, 1976, pp. 405-415;
Itakura, F. xe2x80x9cMinimum prediction residual principle applied to speech recognitionxe2x80x9d, IEEE Trans. Acoustics, Speech and Signal Processing, February, 1975-describes a temporal alignment algorithm and a method for computing a distance metric;
Le Roux, J. and Gueguen, C. xe2x80x9cA fixed point computation of partial correlation coefficientsxe2x80x9d, IEEE ASSP, June, 1977;
Peacocke, R. D. and Graf, D. H, xe2x80x9cAn introduction to speech and speaker recognitionxe2x80x9d, IEEE Computer, Vol. 23(8), August, 1990, pp. 26-33;
L. R. Rabiner et al, xe2x80x9cSpeaker-independent recognition of isolated words using clustering techniquesxe2x80x9d IEEE Trans Acoustics, Speech and Signal Processing, Vol. ASSP-27, No. 4, August, 1979, pp. 336-349;
L. R. Rabiner, Levison, S.E. and Sondhi, M. M., xe2x80x9cOn the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognitionxe2x80x9d, Bell Systems Tech J, Vol. 62(4), April, 1983, pp. 1075-1105;
L. R. Rabiner, and Sambur, M.R., xe2x80x9cAn algorithm for determining the endpoints of isolated utterancesxe2x80x9d, Bell Systems Tech J, February, 1975;
L. R. Rabiner, and Wilpon, J. G., xe2x80x9cA simplified, robust training procedure for speaker trained isolated word recognition systemsxe2x80x9d J Acoustical Society of America, November, 1980.
The disclosures of all the above publications are incorporated herein by reference.
The present invention seeks to provide an improved computerized system for speech and pronunciation teaching in which recorded reference speech specimens are presented to a student and in which a quantification of the similarity between the student""s repetitions and the originally presented reference speech specimens is displayed to the user.
The present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited for independent speech study and does not require presence of a trained human speech and pronunciation expert. Preferably, the system of the present invention includes verbal prompts which guide a user through a teaching system without requiring recourse to a human teacher. Preferably, student performance is monitored and the verbal prompt sequence branches to take student performance into account. For example, predetermined types of student errors, such as repeatedly mispronouncing a particular phoneme, may be extracted from student speech responses and the verbal prompt sequence may branch to take into account the presence or absence of each type of student error.
The present invention also seeks to provide a speech and pronunciation teaching system which is particularly suited to teaching preferred pronunciation of a foreign language to a speaker of a native language. Preferably, the system of the present invention includes an initial menu presented in a plurality of languages and a multi-language message prompting the user to select the menu option representing his native language. In response to the user""s selection of a native language, the system is preferably operative to present subsequent verbal messages to the user in his own native language, and/or to branch the sequence of verbal messages so as to take into account speech characteristics, such as pronunciation errors, which are known to occur frequently in speakers of the user""s native language. For example, when speaking English, native speakers of Japanese typically confuse the L and R sounds, and also the short I and long E sounds, as in the words xe2x80x9cshipxe2x80x9d and xe2x80x9csheepxe2x80x9d. Native speakers of Arabic and German do not have either of these problems. There is thus provided, in accordance with a preferred embodiment of the present invention, apparatus for interactive speech training including an audio specimen generator for playing a pre-recorded reference audio specimen to a user for attempted repetition thereby, and an audio specimen scorer for scoring a user""s repetition audio specimen.
Further in accordance with a preferred embodiment of the present invention the audio specimen scorer includes a reference-to-response comparing unit for comparing at least one feature of a user""s repetition audio specimen to at least one feature of the reference audio specimen, and a similarity indicator for providing an output indication of the degree of similarity between at least one feature of the repetition audio specimen and at least one feature of the reference audio specimen.
Still further in accordance with a preferred embodiment of the present invention, the apparatus also includes a user response memory to which the reference-to-response comparing unit has access, for storing a user""s repetition of a reference audio specimen.
Additionally in accordance with a preferred embodiment of the present invention, the reference-to-response comparing unit includes a volume/duration normalizer for normalizing the volume and duration of the reference and repetition audio specimens.
Still further in accordance with a preferred embodiment of the present invention, the reference-to-response comparing unit includes a parameterization unit for extracting audio signal parameters from the reference and repetition audio specimens.
Additionally in accordance with a preferred embodiment of the present invention, the reference-to-response comparing unit also includes apparatus for comparing the reference audio specimen parameters to the repetition audio specimen parameters.
Further in accordance with a preferred embodiment of the present invention, the apparatus for comparing includes a parameter score generator for providing a score representing the degree of similarity between the audio signal parameters of the reference and repetition audio specimens.
Still further in accordance with a preferred embodiment of the present invention, the output indication includes a display of the score.
In accordance with one alternative embodiment of the present invention, the output indication includes a display of at least one audio waveform.
Further in accordance with a preferred embodiment of the present invention, the interactive speech training apparatus includes a prompt sequencer operative to generate a sequence of prompts to a user.
Still further in accordance with a preferred embodiment of the present invention, the interactive speech training apparatus also includes a reference audio specimen library in which reference audio specimens are stored and to which the audio specimen generator has access.
Additionally in accordance with a preferred embodiment of the present invention, the reference audio specimen library includes a multiplicity of recordings of audio specimens produced by a plurality of speech models.
Still further in accordance with a preferred embodiment of the present invention, the plurality of speech models differ from one another in at least one of the following characteristics: sex, age, and dialect.
There is also provided in accordance with another preferred embodiment of the present invention, apparatus for interactive speech training including a prompt sequencer operative to generate a sequence of prompts to a user, prompting the user to produce a corresponding sequence of audio specimens, and a reference-to-response comparing unit for comparing at least one feature of each of the sequence of audio specimens generated by the user, to a reference.
Further in accordance with a preferred embodiment of the present invention, the reference to which an individual user-generated audio specimen is compared includes a corresponding stored reference audio specimen.
Still further in accordance with a preferred embodiment of the present invention, the sequence of prompts branches in response to user performance.
Additionally in accordance with a preferred embodiment of the present invention, the sequence of prompts is at least partly determined by a user""s designation of his native language.
Still further in accordance with a preferred embodiment of the present invention, the prompt sequencer includes a multilanguage prompt sequence library in which a plurality of prompt sequences in a plurality of languages is stored and wherein the prompt sequencer is operative to generate a sequence of prompts in an individual one of the plurality of languages in response to a user""s designation of the individual language as his native language.
There is also provided, in accordance with another preferred embodiment of the present invention, apparatus for interactive speech training including an audio specimen recorder for recording audio specimens generated by a user, and a reference-to-response comparing unit for comparing at least one feature of a user-generated audio specimen to a reference, the comparing unit including an audio specimen segmenter for segmenting a user-generated audio specimen into a plurality of segments, and a segment comparing unit for comparing at least one feature of at least one of the plurality of segments to a reference.
Still further in accordance with a preferred embodiment of the present invention, the audio specimen segmenter includes a phonetic segmenter for segmenting a user-generated audio specimen into a plurality of phonetic segments.
Additionally in accordance with a preferred embodiment of the present invention, at least one of the phonetic segments includes a phoneme such as a vowel or consonant.
In accordance with one alternative embodiment of the present invention, at least one of the phonetic segments may include a syllable.
There is also provided in accordance with yet a further preferred embodiment of the present invention, apparatus for interactive speech training including an audio specimen recorder for recording audio specimens generated by a user, and a speaker-independent audio specimen scorer for scoring a user-generated audio specimen based on at least one speaker-independent parameter.
Further in accordance with a preferred embodiment of the present invention, at least one speaker-independent parameter includes a threshold value for the amount of energy at a predetermined frequency.
Still further in accordance with a preferred embodiment of the present invention, the apparatus also includes a conventional personal computer.