The present invention relates generally to speech processing systems, and more particularly to techniques for determining speech quality in such systems.
The most accurate known techniques for evaluating the performance of speech coding systems are subjective speech quality assessment tests such as the well-known mean opinion score (MOS) test. However, these subjective tests are generally costly and time-consuming, and also difficult to reproduce. It is therefore desirable to replace the subjective tests with an objective test for evaluating speech coding performance.
As a result, considerable effort has been devoted to attempting to find a suitable objective distortion measure that will correlate well with subjective MOS measurements. One such objective distortion measure is known as the perceptual speech-quality measure (PSQM), and is described in J. G. Beerends and J. A. Stemerdink, xe2x80x9cA perceptual speech-quality measure based on psychoacoustic sound representation,xe2x80x9d J. Audio Eng. Soc., Vol. 42, pp. 115-123, March 1994, which is incorporated by reference herein. The PSQM measure has been adopted as the ITU-T standard recommendation P.861 for telephone band speech. See ITU-T Recommendation P.861, Objective Quality Measurement of Telephone-Band (300-3400 Hz) Speech Codecs, Geneva, 1996, which is incorporated by reference herein.
Nonetheless, a number of significant problems remain with PSQM and other conventional objective distortion measures. For example, it has not been determined whether or how such measures can be mapped onto the subjective MOS scale in a database independent manner. In addition, conventional objective measures are in some cases unable to accurately assess the quality of processed speech when the source has been corrupted by environmental noise.
A need therefore exists for improved techniques for predicting the quality of speech and other audio signals, such that a subjective MOS measure or other type of subjective quality measure can be determined accurately and efficiently from a corresponding objective distortion measure, in a manner that is robust in the presence of environmental noise.
The invention provides methods and apparatus for estimating subjective measures of audio signal quality using objective distortion measures. In accordance with the invention, a mapping function is generated between subjective measures of audio signal quality, e.g., mean opinion score (MOS) measures, degradation MOS (DMOS) measures or other measures, and corresponding objective distortion measures, e.g., auditory speech quality measures (ASQMs), perceptual speech quality measures (PSQMs) or other objective distortion measures, for known audio signals. The audio signals may be speech signals or any other type of audio signals.
The subjective measures and corresponding objective distortion measures are determined in accordance with, e.g., modulated noise reference unit (MNRU) conditions or other suitable distortion conditions placed on the audio signals, and a regression analysis is applied to the results to generate the mapping function. The mapping function may then be utilized, e.g., to evaluate speech quality of additional source speech from a particular speech coding system. In this case, the objective distortion measure is generated using the additional source speech, and the resulting objective measure is applied as an input to the mapping function to generate an estimate of the value of the subjective measure.
Advantageously, the invention allows an objective distortion measure to be mapped in a database-independent manner to a subjective measure, e.g., a MOS or DMOS scale. The mapping function is database independent in that it can be used to generate accurate estimates of subjective measures of speech quality for speech databases unrelated to those used in generating the mapping function. In addition, the objective distortion to subjective quality measure mapping in an illustrative embodiment of the invention provides more accurate prediction than conventional techniques in the presence of environmental noise. The invention may be implemented in numerous and diverse speech and audio signal processing applications, and considerably improves the accuracy of quality prediction in such applications. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.