1. Technical Field
The present invention relates to a technique to process a speech signal to output a speech feature by using phase information contained in a speech signal.
2. Related Art
Although the robustness of speech recognition systems against noise is being constantly improved, the recognition accuracy under harsh conditions is not high enough. For example, it is known that the recognition rates are very low under very low signal-to-noise ratio (SNR) conditions such as in an automobile that is running at high speed or running with air-conditioning on or in an environment where non-stationary noise such as music or street noise is present. Many approaches have been considered with the aim of improving speech recognition in noisy environments. One of such approaches is to use a feature that is robust against noise.
Features such as cepstra, derived from the spectrum intensity of speech, have been mainly used in conventional speech recognition. Phase information contained in speech signals is discarded in such speech recognition.
Eiichi Sueyoshi et al.; “Utilization of Long-Term Phase Spectrum for Speech Recognition”, Proceedings of the meeting of the Acoustical Society of Japan, March 2009, pp. 161-164 discloses a method that uses conventionally discarded phase information in speech recognition. More specifically, Sueyoshi et al. discloses a method that uses a phase spectrum obtained by analyzing phase information over a long period of time as a feature in order to improve the performance of speech recognition.
Japanese Patent No. 3744934 discloses a technique that evaluates the continuousness of an acoustic feature to determine a speech segment. More specifically, Japanese Patent No. 3744934 discloses a method in which the continuousness of a harmonic structure is evaluated with a value of inter-frame correlation in spectrum intensity shape to determine a speech segment.
In Sueyoshi et al., an experiment shows that the proposed phase spectrum has speech recognition capability. However, Sueyoshi et al. shows only that the proposed phase spectrum feature, even in conjunction with mel-frequency cepstral coefficients, is slightly effective for speaker recognition in a noisy environment with a rather high SNR of 20 dB.
Japanese Patent No. 3744934, on the other hand, discloses the use of the harmonic structure's property of being continuous across temporally consecutive frames for speech recognition. However, Japanese Patent No. 3744934 discloses the technique to evaluate the inter-frame correlation of power spectrum components consisting only of a harmonic structure left to determine whether a segment is a vowel segment or not in order to extract a voiced segment with a high accuracy. The power spectrum consisting only of the harmonic structure does not contain phase information and the fine shapes of power spectrum components in general are susceptible to noise.