With the proliferation of voice input devices, such as vehicle-mounted hands-free phones or mobile phones, that can be used in various environments, voice communication and voice recognition have come to be conducted more than ever before in noisy environments inside vehicles or in outdoor locations. In such noisy environments, the intelligibility of the speaker's voice being heard at the remote end or the accuracy of voice recognition may drop because of background noise, such as noise from running vehicles, that is gathered by a microphone together with the speaker's voice. To address this, voice processing techniques are used which analyze the frequency of the captured voice signal, estimate the noise components contained in the voice signal, and eliminate or reduce the noise components contained in the voice signal. According to such voice processing techniques, the voice signal is divided into overlapping frames and, after multiplying each frame by a windowing function such as a Hanning window, an orthogonal transform is applied to the frame to obtain the frequency spectrum. Then, by applying signal processing such as noise elimination to the frequency spectrum, a corrected frequency spectrum is obtained. Subsequently, an inverse orthogonal transform is applied to the corrected frequency spectrum to obtain a frame-by-frame corrected voice signal and, by sequentially adding up the frames of the thus corrected voice signals in overlapping fashion, a final corrected voice signal is obtained.
However, in the case of the corrected voice signal obtained by applying an inverse orthogonal transform to the corrected frequency spectrum obtained as a result of the frame-by-frame signal processing, the signal value may not be zero at the frame end, and the corrected voice signal may be discontinuous when the successive frames are added up. If this happens, periodic noise proportional to the frame length will be superimposed on the corrected voice signal. This can result in a degradation of voice communication quality or a degradation of the accuracy of voice recognition. To address this problem, a technique in which, each time the amount of overlap between successive frames is increased, the degree of similarity between the signal subjected to filtering and an arbitrary signal is computed, and the amount of overlap is set based on the degree of similarity has been proposed (for example, refer to Japanese Laid-open Patent Publication No. 2013-117639).