1. Field of the Invention
The present invention relates to a speech quality evaluation system that outputs a predicted value of a subjective opinion score for an evaluated speech, and more particularly to a speech quality evaluation system that conducts a speech quality evaluation of a phone.
2. Description of the Related Art
The speech quality evaluation of the phone is generally conducted by psychological experiments by plural evaluators. In a general method taken in the psychological experiments, after one speech sample has been presented to the evaluators, the evaluators select, as a speech quality of the speech sample, one category from categories of about 5 to 9 levels. As an example of the categories, as exemplified by the categories disclosed in ITU-T Recommendation P.800 (“Methods for subjective determination of transmission quality”), one category is selected from five categories having Excellent with 5 points, Good with 4 points, Fair with 3 points, Poor with 2 points, and Bad with 1 point for the speech quality.
However, because the evaluation using the psychological experiments is required to collect a large number of evaluators, there arises a problem that it takes time and costs. In order to address this problem, a technique by which the subjective opinion score is predicted from speech data has been developed.
ITU-T Recommendation P. 862 (“Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”), and ITU-T Recommendation P. 861 (“Objective quality measurement of telephone band (300-3400 Hz) speech codecs”) disclose a technique by which a reference signal (hereinafter referred to as “reference speech”) of an evaluation speech and a speech (hereinafter referred to as “far-end speech”) heard by the phone are compared with each other to predict a predicted subjective opinion score of the phone speech quality.
ETSI EG 202 396-3 V1.2.1 (“Speech Processing, Transmission and Quality Aspects (STQ); Speech Quality performance in the presence of background noise, Part 3: Background noise transmission-Objective test methods,” (2009-01)) discloses a technique by which a predicted value of the subjective opinion score is output by using a speech (hereinafter referred to as “near-end speech”) input to a phone on a speaker side as well as the reference speech and the far-end speech. In this method, in order to predict the speech quality of the phone speech and the speech quality of noise, individually, a mean opinion score (SMOS) of the speech quality and a mean opinion score (NMOS) of noise are calculated, and a general mean opinion score (GMOS) is further calculated. In an expression for calculating the mean opinion score of the speech quality, a reduction in the amount of noise between the near-end speech and the far-end speech is used. Also, in K. Genuit (“Objective evaluation of acoustic quality based on a relative approach,” InterNoise '96(1996)), which is cited in ETSI EG 202 396-3 V1.2.1 (“Speech Processing, Transmission and Quality Aspects (STQ); Speech Quality performance in the presence of background noise, Part 3: Background noise transmission-Objective test methods,” (2009-01)), in prediction of the subjective opinion score, not only a power of speech in each frequency band, but also a temporal variation of the power on every 2-msec duration is calculated.
Japanese Unexamined Application Publication (Translation of PCT) No. 2004-514327 discloses a method of subtracting a physical quantity of echo from a physical quantity of the evaluation speech, in order to consider an influence of echo occurring in the phone for prediction of the subjective opinion score.