Voice recognition processing and hands-free telephone conversation have a problem in that voice recognition performance and articulation will deteriorate because of noise superposed on voice. To solve the problem, various noise removal methods have been proposed. As the most common method, a spectral subtraction algorithm (referred to as “SS algorithm” from now on) has been known. The SS algorithm estimates a noise spectrum from a non-voice section where no voice is present in a voice signal and carries out noise removal by subtracting the estimated noise spectrum from a spectrum of any given frame of the voice signal. However, when there is an error between the estimated noise spectrum and actual noise spectrum superposed on the voice signal, over-subtraction and under-subtraction can occur depending on noise frequency. Although backfilling is made by flooring processing for the over-subtraction, a component of the under-subtraction remains as it is. The component of the under-subtraction is perceived as artificial sounds called musical noise, which results in deterioration in the recognition performance and articulation.
To reduce the musical noise, the following three measures can be conceived.
(1) Reducing the under-subtraction component by increasing a subtracting coefficient.
(2) Improving estimate accuracy of the noise spectrum to reduce subtraction residual error.
(3) Estimating and suppressing the under-subtraction component after subtraction.
As for the foregoing approach (1), since the noise is subtracted greatly even in a voice section, the voice spectrum undergoes distortion, which has an adverse effect on the voice recognition performance. As for the foregoing approach (2), although various methods have been proposed, the noise superposed on a frame is basically unknown and the error cannot be made zero. As for the foregoing approach (3), a conventional method is known which calculates a power ratio of regions near a point of interest on a time-frequency plane and eliminates a musical noise component (see Non-Patent Document 1, for example). More specifically, it calculates cumulative power A of a region enclosed by a distance N from the point of interest on the time-frequency plane and cumulative power B of a region enclosed by a distance M (N<M), considers, when (A−B)×α<B, the region enclosed by the distance N from the point of interest as a musical noise component, and eliminates the musical noise component by making its power zero.