This invention relates to the determination of speech parameters for use in speech processing systems. More particularly, this invention relates to the real-time determination of the pitch period, voiced-unvoiced determination, speech energy, and silence detection.
Speech analysis to determine speech parameters. such as pitch period or fundamental frequency, has become important in a number of situations. For example, in bandwidth compression communications systems, such as vocoders and linear predictive coding systems, speech parameters are encoded and transmitted in place of an electrical facsimile of the speech signal. In such a system, the original speech signal is synthesized from these parameters at the receiving station. Additionally, it has been found that the deaf can be trained to speak intelligibly by a system which visually displays the speech parameters of an instructor or recording in cojunction with the speech parameters of the handicapped person as he attempts to enunciate the same phrase. See, e.g., "Speech Processing Aids for the Deaf -- An Overview," H. Levitt, IEEE Transactions on Audio and Electronics, Vol. AU-21, No. 3, pp. 269-273, June, 1973. Further, systems have been proposed for speaker identification or speaker verification which identify or compare speech characteristics, rather than the more complex frequency pattern associated with speech. See, for example, "New Techniques for Automatic Speaker Verification, " A. Rosenberg and M. Sambur, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. AFFP-23, No. 2, Apr., 1975.
Since pitch period is one of the most important characteristics of speech, a number of speech analysis systems have been proposed for automatically measuring and presenting the pitch characteristics in an electrical format. Two such proposals which are relevant to the instant invention are described in M. M. Sondhi, U.S. Pat. No. 3,381,091, and E. E. David, Jr., et al, U.S. Pat. No. 3,405,237. In the Sondhi and David et al pitch analyzing systems, the resonances or formants which had previously prevented accurate determination of pitch information are suppressed by spectrally flattening the speech waveform and autocorrelating the spectrally flattened signal, following which the pitch signal is determined from the peaks in the autocorrelation function. In the David et al system, spectral flattening is achieved by dividing the speech into frequency bands and adjusting the signal amplitude within each band by automatic gain control or infinite clipping. In the Sondhi system, the formants are suppressed by so-called center-clipping in which oscillations that fall below a certain level are eliminated from the speech waveform.
Although each of these prior art systems often performs adequately, each system exhibits certain characteristics which limit its usage. One substantial limitation of both systems is that a large number of computational operations are necessary. Accordingly, both systems are generally realized by complex implementations which often include programmed digital computers. This computational and structural complexity has generally prevented the real-time determination of speech characteristics, thus usually precluding the application of such systems for on-line applications, such as speaker verification, real-time communications systems, and speech instruction equipment. Additionally, in the case of the David et al system, there are, in fact, certain cases in which the disclosed spectral flattening produces undesirable results. These cases occur when no pitch harmonic is contained within one of the apparatus' frequency bands, resulting in a low-level output from the bandpass filters associated with such frequency bands. This low-level signal tends to deteriorate rather than enhance the pitch detection process. In the Sondhi system, the clipping level is set at a predetermined percentage of the maximum absolute value of the waveform within a specific time interval. Since it is necessary to retain low-level voiced information, it has generally been necessary to set clipping level at a rather low percentage, with 30 percent often being used. Setting the clipping level at such a low value, however, does not provide the most advantageous degree of spectral flattening and can result in erroneous pitch indications.
Accordingly, it is an object of this invention to realize a speech analysis system which includes pitch detection and operates in real-time.
It is a further ojbect of this invention to realize a real-time pitch detector which additionally supplies a signal indicative of whether the applied speech signal is voiced or unvoiced, a signal indicative of whether a voice signal or silence is present, and a signal which indicates the total energy of the incident speech signal.