As one element of prosodic information of a speech, a fundamental frequency change per unit time exists. From the fundamental frequency change, various information such as an accent, an intonation, and voiced/voiceless, is acquired. Accordingly, the fundamental frequency change is used for a speech recognition apparatus and a speaker identification apparatus. In order to acquire the fundamental frequency change, a fundamental frequency is extracted from each frame (each period), and a difference of the fundamental frequency between two adjacent frames along a temporal direction. This difference represents the fundamental frequency change.
However, in this case, it often happens that the fundamental frequency is erroneously extracted. As a result, the fundamental frequency change is also erroneously calculated. Recently, a method for acquiring the fundamental frequency change not affected so much by an extraction error of the fundamental frequency is proposed. For example, this method is disclosed in Japanese Patent No. 2940835 ( . . . Reference 1). In this method, a crosscorrelation function between an autocorrelation function of a predicted residual of some timing (a frame) and an autocorrelation function of a predicted residual of another timing (another frame) is calculated, and a peak value of the crosscorrelation function is extracted. By using the peak value without extracting a pitch, the fundamental frequency change not having an extraction error of the fundamental frequency is acquired.
However, in this method, the fundamental frequency change is acquired based on the predicted residual of a speech. Accordingly, under the influence of a background noise, a shift amount of the maximum crosscorrelative value is different from the fundamental frequency change, and the fundamental frequency change is not correctly acquired.
Furthermore, the autocorrelation function of the predicted residual has a peak at a position of integral number times of the fundamental frequency. However, a shift amount of a peak at the position of integral number times is integral number times as much as a shift amount of the fundamental frequency. In order to correctly acquire the fundamental frequency change, a range of the autocorrelation function of the predicted residual (to calculate the crosscorrelative function) should be set at a correct fundamental frequency. Accordingly, the fundamental frequency should be previously acquired or a range of the fundamental frequency should be suitably set based on a pitch of speaker's voice. However, the range of the fundamental frequency cannot be suitably set. As a result, without limiting the range of the fundamental frequency, the fundamental frequency change having a reduced influence of the background noise is desired to be acquired.