This invention relates to a method and an apparatus for processing a sound signal such as speech or music, which processes the signal so that subjectively bad component included in the sound signal such as quantization noise generated in encoding/decoding process, or sound distortion made by various signal processing such as noise suppression is made subjectively unperceptible.
The more compressibility is increased in encoding information source such as speech or music, the more quantization noise is generated as a distortion made in the encoding process. Furthermore, the quantization noise becomes warped to cause the reproduced sound to be subjectively unbearable. For example, in case of speech encoding method faithfully expressing a speech signal itself such as PCM (Pulse Code Modulation) and ADPCM (Adaptive Differential Pulse Code Modulation), the quantization noise appears at random and the reproduced sound including such a noise is not so subjectively unpleasant. However, as the compressibility is increased and the encoding method becomes more complex, sometimes there appear a certain spectral characteristic peculiar to the encoding method in the quantization noise, which causes the reproduced sound to become subjectively degraded:. Especially, within a signal period where background noise is dominant, a speech model utilized by the speech encoding method with high compressibility does not match, thus the reproduced sound becomes extremely unpleasant sound.
In another case, on performing a noise suppression such as a spectral subtraction method, there remains an estimated error of noise as a damage in the processed signal. This estimated error has a characteristic being much different from the original signal, which may damage subjective evaluation of the reproduced sound.
Conventional methods to suppress the degradation of the subjective evaluation of the reproduced sound due to the quantization noise or distortion are disclosed in Japanese Unexamined Patent Publications No. HEI 8-130513, No. HEI 8-146998, No. HEI 7-160296, HEI 6-326670, HEI 7-248793, and S. F. Boll, xe2x80x9craction SSP-27, No. 2, pp. 113-120, April 1979) (this document is referred to as xe2x80x9cdocument 1xe2x80x9d, hereinafter).
Japanese Unexamined Patent Publication No. HEI 8-130513 aims to improve the quality of the reproduced sound within the background noise period. It is checked whether the period includes only background noise or not. When it is detected to be the period including only background noise, a sound signal is encoded/decoded in an exclusive way to such a period. On decoding the encoded signal within the period including only background noise, the characteristics of a synthetic filter is controlled so as to obtain the perceptually natural reproduced sound.
In Japanese Unexamined Patent Publication No. HEI 8-146998, white noise or previously stored background noise is added to the decoded speech so as to prevent the white noise from turning into harsh grating noise in the reproduced sound due to encoding or decoding.
Japanese Unexamined Patent Publication No. HEI 7-160296 aims to perceptually reduce the quantization noise by postfiltering using a coefficient, which is a filtering coefficient obtained based on an perceptually masking threshold value corresponding to a decoded speech or an index concerning a spectral parameter received by a speech decoding unit.
In a conventional code transmission system where the transmission of the code is suspended during non-speech period for controlling communication power, the decoding side generates and outputs pseudo background noise when the code transmission is suspended. Japanese Unexamined Patent Publication No. HEI 6-326670 aims to reduce an incongruity between an actual background noise included in the speech period and the pseudo background noise generated for the non-speech period. In this method, the pseudo background noise is overlaid onto the sound signal of the speech period as well as the non-speech period.
Japanese Unexamined Patent Publication No. HEI 7-248793 aims to perceptually reduce the distortion sound generated by the noise suppression. First, the encoding side checks whether it is the noise period or the speech period. In the noise period, the noise spectrum is transmitted. In the speech period, the spectrum of speech, in which noise has been suppressed is transmitted. The decoding side generates and outputs a synthetic sound using the received noise spectrum in the noise period. In the speech period, the synthetic sound generated using the received spectrum of speech, in which noise has been suppressed is added to a result of multiplication of the synthetic sound generated using the noise spectrum received in the noise period and overlaying multiplying factor, and the added result is output.
Document 1 aims to perceptually reduce the distortion sound due to the noise suppression by smoothing the amplitude spectrum of the output speech, in which noise has been suppressed with the previous/subsequent period, and further, by suppressing the amplitude only in the background noise period.
As for the above conventional methods, the following problems are to be solved.
In Japanese Unexamined Patent Publication No. HEI 8-130513, there is a problem that a sudden change of the characteristic may happen at a border between the noise period and the speech period because encoding and decoding are completely switched based on the period check result. In particular, if it frequently happens that the noise period is misjudged to be a speech period, the reproduced sound of the noise period, which is to be relatively stable in general, unsteadily changes. This may cause degradation of the reproduced sound of the noise period. When the check result of the noise period is transmitted, information for transmission is required to be added. This information may be mistook on the channel, which may cause another problem, that is, unnecessary degradation. Further, there is another problem that an effective improvement cannot be brought to the reproduced sound in case of specific kind of noise because it is impossible to reduce the quantization noise generated by encoding the sound source only by controlling the characteristic of a synthetic filter.
Japanese Unexamined Patent Publication No. HEI 8-146998 has a problem that a characteristic of the present encoded background noise may lose because a prepared noise is added. In order to make a degraded sound unperceptible, it is required to add a noise with higher level than the degraded sound. This causes another problem that the reproduced background noise becomes loud.
In Japanese Unexamined Patent Publication No. HEI 7-160296, an perceptually masking threshold value is obtained based on a spectral parameter, and a spectral postfiltering is performed based on this threshold value. There is a problem that in case of a background noise with relatively flat spectrum, few components are masked, which may cause no effect to the reproduced sound. Unmasked main component is not much changed, thus there is another problem that a distortion included in the main component may remain unchanged.
In Japanese Unexamined Patent Publication No. HEI 6-326670, pseudo background noise is generated regardless of the actual background noise, which causes a problem that a characteristic of the actual background noise may lose.
In Japanese Unexamined Patent Publication No. HEI 7-248793, encoding and decoding is completely switched according to the period check result, so that when the period is mistook between the noise period and the speech period, the reproduced sound may much degraded. Namely, when a part of the noise period is mistook as the speech period, the quality of the reproduced sound within the noise period discontinuously varies and the reproduced sound becomes unpleasant to hear. On the contrary, when the speech period is mistook as the noise period, the quality of the reproduced sound is generally degraded because speech component may be inserted in the synthetic sound of the noise period generated using a mean noise spectrum and the synthetic sound of the speech period generated using the noise spectrum to be overlaid. Further, in order to make the degraded sound unperceptible within the speech period, a noise with not a low level is required to be overlaid.
In the method according to Document 1, there is a problem that processing delay of half period (about 10 ms-20 ms) may occur because of smoothing process. When a part of the noise period is mistook as the speech period, the quality of the reproduced sound within the noise period discontinuously varies and the reproduced sound becomes unpleasant to hear.
The present invention aims to solve the above problems. It is an object of the invention to provide a method and an apparatus for processing a sound signal, in which the reproduced sound is not much degraded because of mistake of the period check, the dependency on a kind of noise or a spectral shape is small, much delay time is not needed, it is possible to remain a characteristic of the actual background noise, it is not required to increase the background noise level too much, a new information for transmission is not required to be added, and the degraded component caused by encoding the sound source can be efficiently suppressed.
A method for processing a sound signal includes generating a first processed signal by processing an input sound signal, calculating a predetermined evaluation value by analyzing the input sound signal, operating a weighted addition of the input sound signal and the first processed signal based on the predetermined evaluation value to generate a second processed signal, and outputting the second processed signal.
In the above method for generating a first processed signal, the step of generating the first processed signal further includes calculating a spectral component for each frequency by performing a Fourier transformation on the input sound signal, performing a predetermined transformation on the spectral component for each frequency calculated by performing the Fourier transformation, and generating the spectral component after the predetermined transformation by operating an inverse Fourier transformation.
Further, in the above method, the weighted addition is operated in a spectral region.
Further, in the above method, the weighted addition is controlled respectively for each frequency component.
Further, in the above method, the predetermined transformation on the spectral component for each frequency includes a smoothing process of an amplitude spectral component.
Further, in the above method, the predetermined transformation on the spectral component for each frequency includes a disturbing process of a phase spectral component.
Further, in the above method, the smoothing process controls smoothing strength based on an extent of the amplitude spectral component of the input sound signal.
Further, in the above method, the disturbing process controls disturbing strength based on an extent of an amplitude spectral component of the input sound signal.
Further, in the above method, the smoothing process controls smoothing strength based on an extent of time-based continuity of the spectral component of,the input sound signal.
Further, in the above method, the disturbing process controls disturbing strength based on an extent of time-based continuity of the spectral component of the input sound signal.
Further, in the above method, a perceptually weighted input sound signal is used for the input sound signal.
Further, in the above method, the smoothing process controls smoothing strength based on an extent of variability in time of the evaluation value.
Further, in the above method, the disturbing process controls disturbing strength based on an extent of variability in time of the evaluation value.
Further, in the above method, an extent of a background noise likeness calculated by analyzing the input sound signal is used for the predetermined evaluation value.
Further, in the above method, an extent of a frictional noise likeness calculated by analyzing the input sound signal is used for the predetermined evaluation value.
Further, in the above method, a decoded speech decoded from a speech code generated by a speech encoding process is used for the input sound signal.
According to the present invention, a method for processing a sound signal includes decoding the speech code generated by the speech encoding process as the input sound signal to obtain a first decoded speech, generating a second decoded speech by postfiltering the first decoded speech, generating a first processed speech by processing the first decoded speech, calculating a predetermined evaluation value by analyzing any of the decoded speeches, operating weighted addition of the second decoded speech and the first processed speech based on the evaluation value to obtain a second processed speech, and outputting the second processed speech as an output speech.
According to the present invention, an apparatus for processing a sound signal includes a first processed signal generator processing an input sound signal to generate a first processed signal, an evaluation value calculator calculating a predetermined evaluation value by analyzing the input sound signal, a second processed signal generator operating a weighted addition of the input sound signal and the first processed signal based on the evaluation value calculated by the evaluation value calculator and outputting a result of the weighted addition as a second processed signal.
Further, in the above apparatus, the first processed signal generator calculates a spectral component for each frequency by operating a Fourier transformation of the input sound signal, smoothes an amplitude spectral component included in the spectral component calculated for each frequency, and generates the first processed signal by operating an inverse Fourier transformation of the spectral component after smoothing the amplitude spectral component.
Further, in the above apparatus, the first processed signal generator calculates a spectral component for each frequency by operating a Fourier transformation of the input sound signal, disturbs a phase spectral component included in the spectral component calculated for each frequency, and generates the first processed signal by operating an inverse Fourier transformation of the spectral component after disturbing the phase spectral component.