This invention relates to methods and systems for evaluating the quality of speech, and, in particular, to methods and systems for objectively evaluating the quality of speech.
Assessing the quality of speech communications systems is of great importance in the field of speech processing. Speech quality is used to optimize the design of speech transmission algorithms and equipment, and to aid in selecting speech coding algorithms for standardization. It is also an important factor in the purchase of speech systems and services and to predict listener satisfaction. Traditionally, speech quality has been determined using subjective measures based on human listener rating schemes such as, for example, the Mean Opinion Score (MOS) which ranges from 1 to 5 representing unacceptable, poor, fair, good, and excellent, or the Diagnostic Acceptability Measure (DAM) which ranges from 1 to 100.
Since different people have different preferences, there is often significant variation between individual quality scores. To do the subjective testing correctly requires listener crews who are carefully selected and constantly calibrated in order to determine any drift in the individual performance. Also, statistical test design for repeatable results requires listeners to hear many combinations of test conditions using appropriate laboratory facilities. This makes the subjective measures quite expensive and suggests that xe2x80x9cobjectivexe2x80x9d measures could be used to aid the quality estimation task. The term xe2x80x9cobjectivexe2x80x9d refers to mathematical expressions that attempt to estimate or predict subjective speech quality.
Many known algorithms base quality estimates on input-to-output measures. That is, speech quality is estimated by measuring the distortion between an xe2x80x9cinputxe2x80x9d and an xe2x80x9coutputxe2x80x9d speech record, and using regression to map the distortion values into estimated quality. However, in a realistic environment, access to a clean/uncorrupted input signal is not possible. Therefore, objective measures should be based only on the available corrupted output signal. Output-based measures are useful in applications when we only know the received speech record and there is no way to know the source speech record, for example, as in monitoring cellular telephone connections to ensure they maintain adequate performance.
Several known output-based measures have been proposed. These methods, however, either fail to utilize more than one distortion measure for determining the quality of speech or use linear or very simple non-linear models to predict the score of a generally accepted subjective quality rating scheme.
It is thus a general object of the present invention to provide a new and improved method and system for objectively measuring speech quality based on an output speech signal only.
It is another object of the present invention to provide an output-based objective measure that correlates highly with subjective scores over all possible distortions and noise types so as to accurately predict listener preference.
In carrying out the above objects and other objects, features and advantages, of the present invention, a method is provided for objectively measuring the quality of speech. The method includes providing a plurality of speech reference vectors and receiving a corrupted speech signal. The method also includes determining a plurality of distortions of the corrupted speech signal derived from a plurality of distortion measures based on the plurality of speech reference vectors. Finally, the method includes generating a score based on the plurality of distortions.
In further carrying out the above objects and other objects, features and advantages, of the present invention, a system is also provided for carrying out the above described method. The system includes means for providing a plurality of speech reference vectors and means for receiving a corrupted speech signal. The system also includes means for determining a plurality of distortions of the corrupted speech signal based on the plurality of speech reference vectors. Still further, the system includes a non-linear model responsive to the plurality of distortions to generate a score based on the plurality of distortions.
The above objects and other objects, features and advantages of the present invention are readily apparent from the following detailed description of the best mode for carrying out the invention when taken in connection with the accompanying drawings.