The invention concerns the generation of speech images, wherein the sounds of phonemes are plotted with the aid of a speech input card and associated software. The invention has particular application as a speech training aid for the deaf; as a tool in the study of languages of other species (e.g., porpoises); as a preprocessing transformation in auditory prostheses; and a phoneme perception mechanism in speech recognition systems.
Numerous devices have been proposed for displaying and analyzing speech signals with the intent of interpreting the speech as a string of symbols corresponding to the distinctive speech sounds of the language (the phonemes) that conveys the spoken message. With such devices, accurate phoneme recognition falls in the 50-80% range. Human listeners typically achieve 90% accurcy in phoneme recognition.
A first type of prior art device utilizes zero crossing detectors for determining when a speech waveform crosses a predetermined amplitude. Zero crossing detectors have a tendency to respond only to a frequecy component having the highest amplitude. Thus, important information contained in frequency components having lower amplitudes than the peak component are ignored, resulting in a substantial loss of information. Accordingly, zero crossing detectors are not well suited for analyzing the speech waveforms of speakers having widely differing glottal or fundamental frequencies, as exist between men, women, and children.
A second sort of speech analyzer utilizes a bank of parallel bandpass filters, each filter providing a relatively narrow bandpass to an associated amplitude detector. A DC signal is derived which indicates the phoneme amplitude, however, in parallel bandpass filters analyzers the amount of information derived is often so great that difficulties arise in coding the resultant phoneme.
A third type of known speech analyzer is capable of learning the characteristics of different speakers as taught by Moshier in U.S. Pat. No. 4,227,177. Such systems, however, are not usually adaptable for analyzing the speech of a wide variety of speakers whose patterns have not yet been programmed in the analyzer's memory.
U.S. Pat. No. 4,401,851 to Nitta et al teaches a speech recognition circuit, wherein a vowel segment is determined according to the acoustic power spectrum data and a vowel and consonant are recognized according to the respective acoustic power spectrum data in the vowel segment and outside the vowel segment. Lokerson's U.S. Pat. No. 4,039,754 discloses a speech analyzer for accurately indicating the phoneme utterances of speakers having widely varying speech characteristics. The phoneme utterance is divided into three formants, wherein the frequency content of one formant is normalized against another. A first and third formant are normalized relative to a second formant frequency, by taking the ratio of the first to second formants and third the to second formants, such that compensation is provided for the shift in fundamental frequencies of different speakers.