1. Field of the Invention
The invention lies in the area of quality measurement of sound signals, such as audio, speech and voice signals. In particular, it relates to a method and a device for determining, according to an objective measurement technique, the speech quality of an output signal as received from a speech signal processing system, with respect to a reference signal.
2. Description of the Prior Art
Methods and devices of such type are known, e.g., from References [1, - - - , 5] (for more bibliographic details on the References, see below under C. References). Methods and devices, which follow the ITU-T Recommendation P.861 or its successor Recommendation P.862 (see References [6] and [7]), are also of such a type. According to the present known technique, an output signal from a speech signals processing and/or transporting system, such as wireless telecommunications systems, Voice over Internet Protocol transmission systems, and speech codecs, which is generally a degraded signal and whose signal quality is to be determined, and a reference signal, are mapped onto representation signals according to a psycho-physical perception model of the human hearing. As a reference signal, an input signal of the system applied with the output signal obtained may be used, as in the cited references. Subsequently, a differential signal is determined from the representation signals, which, according to the perception model used, is representative of a disturbance sustained in the system and present in the output signal. The differential or disturbance signal constitutes an expression for the extent to which, according to the representation model, the output signal deviates from the reference signal. Then, the disturbance signal is processed in accordance with a cognitive model, in which certain properties of human test subjects have been modelled, in order to obtain a time-independent quality signal, which is a measure of the quality of the auditive perception of the output signal.
The known technique, and more particularly methods and devices which follow the Recommendation P.862, have, however, the disadvantage that severe distortions caused by extremely weak or silent portions in the degraded signal, and which contain speech in the reference signal, may result in a quality signal which possesses a poor correlation with subjectively determined quality measurements, such as mean opinion scores (MOS) of human test subjects. Such distortions may occur as a consequence of time clipping, i.e., replacement of short portions in the speech or audio signal by silence, e.g., in case of lost packets in packet switched systems. In such cases, the predicted quality is significantly higher than the subjectively perceived quality.