1. Field of the Invention
This invention relates to methods of evaluating the quality of speech, and, in particular, to methods of evaluating the quality of speech by means of an objective automatic system.
2. General Background
Speech quality judgments in the past were determined in various ways. Subjective, speech quality estimation was made by surveys conducted with human respondents. Some investigators attempted to evaluate speech quality objectively by using a variety of spectral distance measures, noise measurements, and parametric distance measures. Both the subjective techniques and the prior objective techniques were widely used, but each has its own unique set of disadvantages.
The purpose of speech quality estimation is to predict listener satisfaction. Hence, speech quality estimation obtained through the use of human respondents (subjective speech quality estimates) is the procedure of choice when other factors permit. Disadvantageously, the problems with conducting subjective speech quality studies often either preclude speech quality assessment or dilute the interpretation and generalization of the results of such studies.
First and foremost, subjective speech quality estimation is an expensive procedure due to the professional time and effort required to conduct subjective studies. Subjective studies require careful planning and design prior to the execution. They require supervision during execution and sophisticated statistical analyses are often needed to properly interpret the data. In addition to the cost of professional time, human respondents require recruitment and pay for the time they spend in the study. Such costs can mount very quickly and are often perceived as exceeding the value of speech quality assessment.
Due to the expense of the human costs involved in subjective speech quality assessment, subjective estimates have often been obtained in studies that have compromised statistical and scientific rigor in an effort to reduce such costs. Procedural compromises invoked in the name of cost have seriously diluted the quality of the data with regard to their generalization and interpretation. When subjective estimates are not generalized beyond the sample of people recruited to participate in the study, or even when the estimates are not generalized beyond some subpopulation within the larger population of interest, the estimation study has little real value. Similarly, when cost priorities result in a study that is incomplete from a statistical perspective (due to inadequate controlled conditions, unbalanced listening conditions, etc.), the interpretation of the results may be misleading. Disadvantageously, inadequately designed studies have been used on many occasions to guide decisions about the value of speech transmission techniques and signal processing systems.
Because cost and statistical factors are so common in subjective speech quality estimates, some investigators have searched for objective methods to replace the subjective methods. If a process could be developed that did not require human listeners as speech quality judges, that process would be of substantial utility to the voice communication industry and the professional speech community. Such a process would enable speech scientists, engineers, and product customers to quickly evaluate the utility of speech systems and quality of voice communication systems with minimal cost. There have been a number of efforts directed at designing an objective speech quality assessment process.
The prior processes that have been investigated have serious deficiencies. For example, an objective speech quality assessment process should correlate well with subjective estimates of speech quality and ideally achieve high correlations across many different types of speech distortions. The primary purpose for estimating speech quality is to predict listener satisfaction with some population of potential listeners. Assuming that subjective measures of speech quality correlate well with population satisfaction (and they should, if assessment is conducted properly), objective measures that correlate well with subjective estimates will also correlate well with population satisfaction levels. Further, it is often true that any real speech processing or voice transmission system introduces a variety of distortion types. Unless the objective speech quality process can correlate well with subjective estimates across a variety of distortion types, the utility of the process will be limited. No objective speech quality process previously reported in the professional literature correlated well with subjective measures. The best correlations obtained were for limited set of distortions.