1. Field of the Invention
The present invention relates to a formant detecting device for detecting a formant from an input speech signal and more particularly to a speech processing apparatus for enhancing frequency components in important frequency bands selected from a plurality of frequency bands included in the input speech signal.
2. Description of the Related Art
Normally, voiced speech contains a plurality of phonemes. In the spectrum analysis of a speech wave, each phoneme is characterized by several frequency bands on which energy concentrates. In the power spectrum of a speech signal, a frequency band of spectral peaks wall be called a formant hereinafter in this specification. In the human auditory system, a frequency analysis of speech is performed In the cochlea and auditory nerve of the internal ear to obtain a distribution of formants, which is used as a clue for specifying a phoneme. However, in the case of hearing-impaired listeners, since their ability of distinguishing one utterance from another when simultaneously hearing a plurality of utterances with different frequencies is reduced (a decline of frequency selectivity) compared with normal listeners, they often have difficulty In perceiving a formant. Also, when a noise can obscure speech, even the frequency selectivity of normal listeners is reduced due to the masking effect caused by the noise.
A formant enhancing device is known as a device which improves articulation of speech for the above-mentioned listeners with their frequency selectivity reduced.
Acta Otoraryngol 1990; Suppl. 469: pp. 101-107 discloses a conventional formant enhancing device.
FIG. 7 shows a construction of such a formant enhancing device, which has a frequency analyzing unit 10, a contrast enhancing unit 20 and an inverse transformation unit 30. The frequency analyzing unit 10 calculates a power spectrum and the phase of the input speech signal in each frequency band. This processing is realized via FFT, for instance. The contrast enhancing unit 20 enhances contrasts between peaks and valleys in the power spectrum which is obtained by the frequency analyzing unit 10. The contrast enhancing unit 20 enhances the difference in energy between spectral valleys and spectral peaks in the power spectrum of the input speech signal. In this specification, a power spectrum obtained in this way will be called a contrast-enhanced power spectrum, hereinafter. As a method for enhancing contrast, it is available as a method of convoluting a power spectrum with a function of lateral inhibition combined with an error function by using an engineering model for lateral inhibition (Equation 1). ##EQU1## where ke&gt;ki, de&lt;di
There are other methods, such as powering each frequency component of the power spectrum, and multiplying the power spectrum by a smoothed out power spectrum obtained by cepstral analysis.
The inverse transformation unit 30 performs inverse transformation of the contrast-enhanced power spectrum, with its contrasts enhanced by the contrast enhancing unit 20, and the phase obtained by the frequency analyzing unit 10 into a speech signal as a function of time. For example, the inverse transformation unit 30 conducts inverse FFT so as to obtain a speech signal. In this case, in order to improve the naturalness of the speech, the frequency analyzing unit 10 performs a frequency analysis at intervals shorter than one frame of FFT, and the inverse transformation unit 30 generally performs an overlap-addition, i.e., a weighted-summation of immediately neighboring frames.
Hereinafter, the operation of a conventional formant enhancing device employing the above-mentioned construction will be explained. The frequency analyzing unit 10 calculates the power spectrum and the phase of input speech signal. The contrast enhancing unit 20 increases frequency components of spectral peaks in the power spectrum and decreases frequency components of spectral valleys in the power spectrum. The frequency band of spectral peaks corresponds to a formant. The inverse transformation unit 30 performs inverse transformation of the contrast-enhanced power spectrum and the phase of the input speech signal into a speech signal in time sequence. Thus, a speech signal easily audible even to hearing-impaired listeners can be obtained.
IEEE Trans. SP vol. 39, No. 9, pp. 1943-1954 discloses other conventional formant enhancing devices.
FIG. 8 shows a construction of such a formant enhancing device. In FIG. 8, the same components as those in FIG. 7 are denoted by the same reference numerals as those in FIG. 7, and the description thereof is omitted. In a divider 110, the contrast-enhanced power spectrum, obtained by the contrast enhancing unit 20, is divided by the power spectrum obtained by the frequency analyzing unit 10. In this way, the power spectrum is normalized, and a value of gain for each frequency band (referred to as a gain value hereinafter) is determined. A frequency characteristics variable filter 120 varies frequency characteristics of the input speech signal in accordance with the value of gain determined by the divider 110. In the case where the frequency analyzing unit 10 calculates a power spectrum every several sampling intervals, the output of the divider 110 is subject to an interpolative processing, and thereby naturalness of speech is improved.
A speech signal audible even to hearing-impaired listeners can be obtained also by formant enhancing devices according to the above-mentioned construction.
However, the formant enhancing devices shown in FIGS. 7 and 8 have a problem that the naturalness of speech is reduced, since a relationship of energy level among frequency components of spectral peaks in the contrast-enhanced power spectrum changes greatly from that in the power spectrum of the original speech signal.
Also, in a case where the engineering model for lateral inhibition is applied to the formant enhancing devices shown in FIGS. 7 and 8 so as to enhance contrasts, the level of the output speech signal from the formant enhancing device depends on the function of lateral inhibition to be convoluted in the power spectrum of the input speech signal, thus becoming excessively high or low. Accordingly, the output signal having a proper level cannot be obtained.
Further, in the formant enhancing devices shown in FIGS. 7 and 8, for the purpose of adjusting the extent to which a contrast is enhanced, it is required to change the function of lateral inhibition. This causes a difficulty in adjusting the extent. In the case where the extent to which a contrast is enhanced is adjusted to obtain a high contrast, if a speech signal overlapped with a background noise is input, the contrast between peaks and valleys in the power spectrum of the noise is enhanced. In this way, the noise is modulated, reducing the naturalness of speech as a result.