The present invention relates generally to speech quality measurement and, more particularly, to speech quality measurement of voice transmitted over a packet network.
Perceived speech quality assessment has traditionally been performed using subjective testing, which involves considerable time, effort and resources. Subjective tests are carried out by having a number of listeners come in and listen to a set of speech files and rate them on a subjective scale. Objective speech quality metrics try to estimate the perceived speech quality by comparing the original and distorted speech signals.
Traditional objective measures such as Signal to Noise Ratio (SNR) do not provide a good estimate of subjective quality, especially when sophisticated low bit rate speech coding techniques are used. An auditory model can be used to perceptually weight the distortion between the original and the test signals, to compute the perceptually significant distortion.
Other methods using a perceptual model compute a weighted average of the frame based perceptually weighted distortion measure to compute the objective quality score. One such method is PSQM (Perceptual Speech Quality Measure) which is used in ITU-T standard P.861. This method uses a perceptual model to map the original and test speech signals onto a psychophysical representation to compute a xe2x80x9cnoise disturbancexe2x80x9d for each frame of speech. The PSQM score is computed as a weighted average of the xe2x80x9cnoise disturbancexe2x80x9d where silence frames and speech frames are given different weights. The xe2x80x9cnoise disturbancexe2x80x9d of PSQM is an example of a frame based perceptual distortion.
A PSQM test system 100 is shown in FIG. 1. A sound source 10 generates a series of sound sample frames x[n] which are input to a signal processor 20. The signal processor 20 processes the sound sample frames x[n] and outputs a series of test or coded sound frames y[n]. The series of sound sample frames x[n] and the series of coded sound frames y[n] are then input to PSQM processor 30 which processes the two series and generates PSQM parameters which evaluate the quality of the coding performed by the signal processor 20.
FIG. 2 is a block diagram which describes the PSQM algorithm performed by the PSQM processor 30. Within PSQM, the physical signals constituting the source and test speech, x[n] and y[n] respectively, are mapped onto psychophysical representations that match the internal representations of the speech signals (i.e. the representations inside our heads) as closely as possible. These internal representations make use of the psychophysical equivalents of frequency (critical band rates) and intensity (Compressed Sone). Masking is modeled in a simple way: masking is taken into account only when two time-frequency components coincide in both the time and frequency domains.
Within the PSQM approach, the quality of the test speech is judged on the basis of differences in the internal representation. This difference is used to calculate the noise disturbance as a function of time and frequency. In PSQM, the average noise disturbance is directly related to the quality of test speech. The PSQM approach is discussed in detail in ITU Recommendation P.861 xe2x80x9cMethods for Objective and Subjective Assessment of Qualityxe2x80x9d.
A sound quality evaluation processor, according to the present invention, includes a comparator and a sequence processor. The comparator has first and second inputs and an output. The first input is configured to receive a sequence of sound sample frames and the second input is configured to receive a sequence of test sound frames. The comparator is configured to compare each frame of the sequence of test sound frames to a corresponding one of the sequence of sound sample frames in order to generate a sequence of distortion measure values at the output of the comparator. The sequence processor has first and second inputs and a first output. The first input is configured to receive the sequence of distortion measure values from the comparator and the second input is configured to receive a temporal outlier distortion threshold value. The sequence processor detects temporal-outlier sequences (TOSs) in the distortion measure values that are greater than the temporal outlier distortion threshold value. An average TOS length is then computed for output at the first output of the sequence processor.
The sound quality evaluation processor, according to the present invention, can also include an outlier processor having a first input configured to receive the sequence of distortion measure values from the comparator and a second input being configured to receive a perceptual outlier distortion threshold value. The outlier processor detects each perceptual outlier frame having a distortion measure value greater than the perceptual outlier distortion threshold value. The number of perceptual outlier frames is divided by the number of distortion measure values to obtain a percent of perceptual outliers output at the first output of the outlier processor.
The features and advantages of the invention will become more readily apparent from the following detailed description of a preferred embodiment of the invention which proceeds with reference to the accompanying drawings.