1. Field of the Invention
The present invention relates to speech quality assessment.
2. Description of Related Art
As modern telecommunication networks are becoming more complex and are evolving from circuit-switched networks to packet-based networks such as voice over internet protocol (VoIP), new types of distortion affecting perceived speech quality are being encountered. Thus, maintaining and improving the quality of service (QoS) of in-service networks continues to be an important issue. In the current art, subjective speech quality assessment is the most reliable and commonly accepted way for evaluating the quality of speech. In subjective speech quality assessment, human listeners are used to rate the speech quality of processed speech, wherein processed speech is a transmitted speech signal which has been processed, e.g., decoded, at the receiver. This technique is subjective because it is based on the perception of the individual human. However, subjective speech quality assessment is an expensive and time consuming technique because sufficiently large number of speech samples and listeners are necessary to obtain statistically reliable results. These subjective results, for example, rating speech quality on a scale of 1 to 5 are averaged to obtain a mean opinion score (MOS).
Objective speech quality assessment is another technique for assessing speech quality. Unlike subjective speech quality assessment, objective speech quality assessment is not based on the perception of the individual human. Objective speech quality assessment may be one of two types. The first type of objective speech quality assessment is based on known source speech, and is often referred to as an intrusive assessment. In this first type of objective speech quality assessment, for example, a mobile station transmits a speech signal derived, e.g., encoded, from known source speech. The transmitted speech signal is received, processed and subsequently recorded. The recorded processed speech signal is compared to the known source speech using well-known speech evaluation techniques, such as Perceptual Evaluation of Speech Quality (PESQ), to determine speech quality. If the source speech signal is not known or the transmitted speech signal was not derived from known source speech, then this first type of objective speech quality assessment cannot be utilized.
The second type of objective speech quality assessment is not based on known source speech and is referred to as non-intrusive, single-ended or output-based. Most embodiments of this second type of objective speech quality assessment involve estimating source speech from processed speech, and then comparing the estimated source speech to the processed speech using well-known speech evaluation techniques. Non-intrusive methods have great potential in real applications, e.g., monitoring the speech quality of in-service networks, where the source speech signal is not available. Some attempts have been made to build non-intrusive measurement systems by measuring the deviation of feature vectors of the degraded speech signal from a set of codewords derived from un-degraded source speech databases, or by the parameterization of a vocal tract model which is sensitive to telecommunication network distortions. Recently in the ITU-T, a standardization activity called P.SEAM (Single-Ended Assessment Models) was created to standardize an algorithm for non-intrusive estimation of speech quality. Several models were proposed and one of them was adopted as a standard recommendation P.563. However, the ITU-T P.563 model shows very limited performance even for the known MOS data used in the development of the model—average of about 0.88 correlation between subjective and objective scores for 24 MOS tests.