Users of hearing aids have heightened expectations of the hearing aids each having a “conversation emphasizing function” enabling the user to hear an emphasized voice of a conversation partner. As one method for determining whether the voice is issued from the conversion partner at a time of emphasizing the voice of the conversion partner, Patent Document 1 discloses a method for detecting a voice in a conversation with a person wearing the hearing aid according to the consistency of utterance timing. The hearing aid is required to emphasize the voice of the conversation partner, and enables the person wearing the hearing aid to respond to a call from the environment, and to recognize the sound issued from the environmental event.
In a device worn in close contact with an object such as the hearing aid, there occurs such a phenomenon that a frequency characteristic of an input voice is varied depending on an arrival direction (angle) of sound due to an influence of a position of a microphone or a neighborhood shape. For example, when the call is detected by the hearing aid, the hearing aid recognizes the call voice with the input of the voice different in the frequency characteristic depending on the arrival direction (angle) of sound. For that reason, the frequency characteristic of the sound to be recognized deviates from the frequency characteristic of the voice data at the time of learning an audio standard pattern used for collation (in off line), thereby deteriorating the recognition precision of the hearing aid.
As a method for correcting the frequency characteristic of the audio standard pattern used for verification, Non-patent Document 1 discloses a cepstral mean normalization CMS (cepstral mean subtraction CMS). The cepstral mean normalization CMS method (hereinafter referred to as “CMS method”) is a method in which a difference of the frequency characteristic of the input voice is estimated by a mean of cepstral of the input voice, and the difference is applied to the input voice for correction. In the CMS method, there is a need to obtain the cepstral mean of the input voice after phonation has been completed, and real-time processing is disabled.
As a method of realizing the real-time processing through the CMS method, Non-patent Document 2 has proposed a MAP-CMS method in which the cepstral mean of the input voice is estimated from the cepstral mean from start of the input voice to the present frame of the input voice, and the cepstral mean of the voice data at the time of learning the audio standard pattern used for verification of the voice recognition (in off line) through MAP estimation to normalize the input voice.