A subjective estimation and an objective estimation are known as a method of estimating the quality of an audio signal.
There is known an objective estimation method for comparing an original voice having no noise with an estimation target voice to calculate an objective estimation value as in the case of PESQ (Perceptual Evaluation of Speech Quality), for example. Furthermore, there is known a method of determining a relational expression of a subjective estimation value and the objective estimation value based on the subjective estimation value (MOS value: Mean Opinion Score value) as a result obtained by subjectively estimating a noise-contaminated voice by using a sample voice and the objective estimation value as a result obtained by objectively estimating the noise-contaminated voice by PESQ. These techniques are disclosed in Japanese Laid-open Patent Publication No. 2001-309483, Japanese Laid-open Patent Publication No. 7-84596 or Japanese Laid-open Patent Publication No. 2008-15443, for example.
In the audio quality estimating methods described above, it is impossible to determine a distortion amount of a noise-contaminated voice. Furthermore, the method of determining the relational expression of the subjective estimation value and the objective estimation value described above has a problem in that although the estimation precision for a voice contaminated with a noise similar to the noise of the sample voice is high, the estimation precision of a voice contaminated with a noise which is greatly different from the noise of the sample voice is low.
Furthermore, when audio signal processing such as directional sound reception processing, noise suppressing processing, or the like is executed on a noise-contaminated audio signal, distortion occurs in both a noise section and a voice section of the processed audio signal. In this case, with respect to the noise section, power is reduced due to the signal processing described above, and thus it is difficult to measure an accurate distortion amount. On the other hand, with respect to the voice section, it is difficult to obtain an estimation result near to the subjective estimation.