The spectrograph is the time-honored tool for study of speech perception for visually displaying segregated frequency patterns in space. Recently, however, there is evidence to show that many differences exist between a speech spectrograph and a neural output of a cochlea. For example, the output of the cochlea is linear up to about 1,000 Hz and logarithmic above that, whereas the spectrograph is either logarithmic or linear. Thus, there is serious question whether the temporal processing of acoustic components of speech by the ear can be demonstrated by a speech spectrogram. This is not considered to be a trivial shortcoming of the speech spectrogram given that the fundamental frequency and the first formant information of speech are coded, at least to some extent, by temporal information.
Although it can now be shown that there is a basic problem with the use of the speech spectrogram for voice analysis, the primary problem appears to be that the exclusive use of the spectrograph for speech analysis confines the concept of the speech signal to a spatio-temporal display. Such a spatio-temporal display is now realized to be quite unlike the signal as it exists in space. An acoustic speech signal does have spatial property, but they are not necessarily related to the spatial separation frequencies manifested in the speech spectrogram. The acoustic speech signal is a frequency-integrated, complex waveform whose actual spatial properties are indicated by radiation and reflection, not by frequency. The acoustic world we live in is not fully represented by a speech spectrogram.
Heretofore, the reliance on the speech spectrograph as the sole source of information about the speech signal has restricted observations to an analog of the neural output of the cochlea. This has resulted in the missing of the hypothesized higher order free integration stage of perceptual analysis. It is now understood that a more complete analysis of a speech signal requires a study of the speech waveform in addition to the spectrogram. The spectrogram provides the components of the signal (spatial) and the waveform shows how the components are integrated in time and space.
A feature of the present invention is to provide apparatus for analyzing the speech waveform for kinds of information not readily observable from the spectrogram. Further, by the apparatus of the present invention, information available from the speech waveform can be applied to practical problems in speech science. Specifically, by examining the speech waveform, there is developed a real time speaker independent pitch extractor with a voice pitch display. Further, an examination of the speech waveform has been applied to the invention described and claimed in the co-pending patent application entitled, "Tactile Aid", Ser. No. 048,237, filed June 13, 1979, which is apparatus for speech reception aid for the deaf. A tactile aid to speech reception as described in this co-pending application is a device which presents information about the speech signal to the observer through the skin.
In accordance with the present invention, there is provided apparatus for integrating redundant acoustic information in time. Such apparatus provides a technique for duplicating the operation of the perceptual system for integrating the periodicity of several spectral channels to aid in the perception of pitch frequencies. The apparatus of the present invention utilizes the basic principle that no matter how the speech signal is filtered, it will always have the periodicity of the fundamental frequency. This is due to the fact that all the resonances of the vocal tract are excited by the same source function and therefore must have the same periodicity. The system of the present invention evaluates true spectral pitch, that is, the periodicity of the fundamental component of a speech signal as an aid in the perception of speech.
Apparatus of the present invention operates on the premise that the common periodicities seen across spectral channels in response to a vowel sound are reflected in neural responses. A feature of the apparatus of the present invention is that it makes use of the observation that the amplitudes of the signals across the basilar membrane will be greatest at the beginning of each fundamental period of a speech signal and that the perceptual systems takes advantage of this redundancy about the pitch frequency of a sound signal. The system of the present invention adds waveforms in low frequency channels to take advantage of the redundancy about the pitch frequency.