Double-ended, or intrusive, algorithms are used in speech quality estimation for providing an objective quality estimation. Double-ended, or intrusive, algorithms for speech quality estimation require access to both a clean reference signal and a degraded, i.e. processed, signal. The degraded signal is derived from the reference signal by the introduction of some distortions e.g. caused by real or emulated transmission or compression system(s). The purpose of objective quality estimation algorithms is to replace subjective listening tests, and thus reduce cost, save time, and allow for continuous network quality monitoring.
The subjective quality level of a communication system is typically set as the Mean Opinion Score (MOS) obtained from a pool of listeners. The most popular quality scale in subjective quality assessments is the five grade scale {1-5}; where 1 corresponds to “bad quality” and 5 corresponds to “excellent quality” [1].
Objective quality estimation is typically performed by standardized algorithms [2]. These algorithms have reference and degraded signals as an input, and output an estimated objective quality level Qobj on an internal objective scale, which does not match exactly the subjective scale Qsubj, see FIG. 1. Typically, the internal objective scale may be e.g. {0-100} or {−0.5-4.5}, due to e.g. implementation reasons. The objective quality level can be mapped to the subjective scale by use of conventional mapping in a post-processing step. The purpose of this mapping is to produce an output of the quality assessment algorithm in the scale {1-5}, which is commonly used in subjective tests, and therefore well known.
The block “objective quality estimator” 104 in FIG. 1 represents a state-of-the-art speech quality assessment algorithm [2]. An estimate of speech quality is obtained in the following steps: I) pre-filtering, e.g., IRS filter, II) time and gain alignment of reference and degraded signals, III) transform to perceptual domain, e.g., Bark scale followed by compression law, and IV) relating the difference between transformed reference and degraded signals to objective quality.
The test object, e.g. illustrated as the block “communication system” 102 in FIG. 1, can be e.g. a PC simulated speech codec, noise suppressor, etc., or a real network with multiple transcodings, channel errors, and even unknown types of degradations. A typical system under test operates on signals of certain bandwidths, e.g., NB (NarrowBand), WB (WideBand), or SWB (Super WideBand). When a system to be tested operates on signals of a certain bandwidth, this bandwidth is also the expected bandwidth of the degraded signal, which is output from the system. It is not always clear what is meant by SWB, since the term SWB is sometimes used for 14 KHz, sometimes for 16 KHz, and sometimes for the “entire frequency spectrum”, i.e. full band. However, here it is assumed that SWB is 14 kHz.
The block “mapping to subjective scale” 106 in FIG. 1 maps the internal objective scale to a five grade subjective scale, e.g., {0-100}->{1-5}; or {−0.5-4.5}->{1-5}. This mapping is linear.
An objective speech quality estimator may allow a test object or system to be evaluated against reference signals of different bandwidths, i.e. the reference signal and the degraded signal may be of different bandwidth, and/or against reference signals having the same bandwidth as the degraded signal. The practice of using reference signals with different bandwidths is related to the evolution of speech communication systems, where wider and wider bandwidths are supported by e.g. speech codecs, while legacy narrow-band systems and equipment are still in use.
The quality score of a system being tested against different bandwidths depends on the bandwidth of the reference signal used. It could be distinguished between NB scale, i.e. reference signal of bandwidth 3.5 kHz; WB scale, i.e. reference signal of bandwidth 7 kHz; or SWB scale, i.e. reference signal of bandwidth 14 kHz. It is a problem that the results of speech quality assessment are provided in different bandwidth scales, e.g. when speech quality characteristics of different systems or components are to be compared.