Conventional speech coding techniques enable speech communications of high quality in speeches with no noises, but have such a problem that in speeches including noises or the like, grating noises specific to digital communications occur and the speech quality deteriorates.
As a speech enhancing technique for suppressing such a noise, there are a spectral subtraction method and comb filtering method.
The spectral subtraction method is to suppress a noise by estimating characteristics of a noise in a non-speech interval with attention focused on noise information, subtracting the short-term power spectrum of the noise or multiplying an attenuation coefficient, from or by the short-term power spectrum of a speech signal including the noise, and thereby estimating the power spectrum of the speech signal to suppress the noise. Examples of the spectral subtraction method are described in “S.Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans.Acoustics, Speech, and Signal Processing, vol.ASSP-27, pp.113-120, 1979”, “R. J. McAulay, M. L. Malpass, Speech enhancement using a soft-decision noise suppression filter, IEEE.Trans.Acoustics, Speech, and Signal Processing, vol.ASSP-28, pp.137-145, 1980”, Patent 2714656, and Japanese Patent Application HEI9-518820.
Meanwhile, the comb method is to attenuate a noise by applying a comb filter to a pitch of the speech spectrum. An example of the comb filtering is described in”.
A comb filter is one which attenuates or does not attenuate a signal input per frequency region basis to output the signal, and which has comb-shaped attenuation characteristics. When the comb filtering method is achieved in digital data processing, data of attenuation characteristics is generated per frequency region basis from the attenuation characteristics of the comb filter, the data is multiplied by the speech spectrum for each frequency, and it is thereby possible to suppress the noise.
FIG. 1 is a diagram illustrating an example of a speech processing apparatus using a conventional comb filtering method. In FIG. 1, switch 11 outputs an input signal itself as an output of the apparatus when the input signal includes a speech component (for example, a consonant) without the quasi-periodicity, while outputting the input signal to comb filter 12 when the input signal includes a speech component with the quasi-periodicity. Comb filter 12 attenuates a noise portion of the input signal per frequency region basis with attenuation characteristics based on the information of speech pitch period, and outputs the resultant signal.
FIG. 2 is a graph showing attenuation characteristics of a comb filter. The vertical axis represents attenuation characteristics of a signal, and the horizontal axis represents frequency. As shown in FIG. 2, the comb filter has frequency regions in which a signal is attenuated and the other frequency regions in which a signal is not attenuated.
In the comb filtering method, by applying the comb filter to an input signal, the input signal is not attenuated in frequency regions in which a speech component exists, while being attenuated in frequency regions in which a speech component does not exists, and thereby a noise is suppressed to enhance the speech.
However, the conventional speech processing method has problems to be solved as described below. First, in the SS method as described in document 1, attention is only focused on the noise information, short-term noise characteristics are assumed as stationary, and a noise base (spectral characteristics of the estimated noise) is uniformly subtracted without distinguishing between a speech and noise. Speech information (for example, pitch of speech) is not used. Since the noise characteristics are not stationary actually, a residual noise remaining after the subtraction, in particular, residual noise between speech pitches is considered as a cause of generating a noise with an unnatural distortion so-called “musical noise” corresponding to the processing method.
As a method of improving the foregoing, a method is proposed of attenuating a noise by multiplying an attenuation coefficient based on a ratio of speech power to noise power (SNR), of which examples are described in Patent 2714656 and Japanese Patent Application HEI9-518820. In the method, since different attenuation coefficients are used while distinguishing between frequency bands of larger speech (large SNR) and of large noise (small SNR), the musical noise is suppressed and the speech quality is improved. However, in the methods described in Patent 2714656 and Japanese Patent Application HEI9-518820, since the number of frequency channels (16 channels) to be processed is not adequate even with part (SNR) of speech information used, it is difficult to separate speech pitch information from a noise to extract. Further, since the attenuation coefficient is used both in speech and noise frequency bands, effects are imposed mutually and the attenuation coefficient cannot be increased. In other words, the increased attenuation coefficient provides a possibility of generating a speech distortion due to erroneous SNR estimation. As a result, the attenuation of noise is not sufficient.
Further in the conventional comb filtering method, when a pitch that is a basic frequency has an estimation error, an error portion is enlarged in its harmonics, which increases a possibility that the original harmonics are out of the passband. Furthermore, since it is necessary to determine whether or not a speech is one with quasi-periodicity, the method has problems with practicability.