The invention relates to a procedure for determining a measure of quality of an audio signal. Furthermore, the invention refers to a device for implementing this procedure as well as a noise suppression module and an interrupt detection and interpolation module for use in such a device.
Assessing the quality of a telecommunications network is an important instrument for achieving and maintaining the required service quality. One method of assessing the service quality of a telecommunications network involves determining the quality of a signal transmitted via the telecommunications network. In the case of audio signals and in particular voice signals, various intrusive procedures are known for this purpose. As the name suggests, such procedures intervene in the system to be tested in such a way that a transmission channel is allocated and a reference signal is transmitted along it. The quality is then assessed subjectively, for example, by one or several test persons comparing the known reference signal with the received signal. This procedure is, however, elaborate and therefore expensive.
A further intrusive procedure for machine-assisted quality assessment of an audio signal is described in EP 0 980 064 where a spectral similarity value of the known source signal and the received signal are determined for the purpose of assessing the transmission quality. This similarity value is based on a calculation of the covariance of the spectra of the source signal and of the receive signal and division of the covariance by the standard deviations of both specified spectra.
Intrusive methods, however, generally have the disadvantage that, as already mentioned, it is necessary to intervene in the system to be tested. This means, to determine the signal quality, at least one transmission channel must be occupied and a reference signal transmitted on it. This transmission channel cannot be used for data transfer purposes during this period of time. In addition, although in a broadcasting system such as a radio service for example it is in principle possible to assign the signal source for transmitting test signals, however, since all channels are consequently occupied and the test signal would be transmitted to all receivers, this procedure is extremely impractical. Intrusive procedures are likewise unsuitable for the purpose of simultaneously monitoring the quality of a large number of transmission channels.
The task of the invention is to provide a procedure of the above-specified type that avoids the disadvantages of the state of the art and, in particular, provides an opportunity for assessing the signal quality of a signal transmitted via a telecommunications network without knowledge of the originally transmitted signal.
The solution to this task is defined by the features of Patent Claim 1. Initially, in the inventive procedure for machine-assisted definition of a measure of quality of an audio signal a reference signal is determined from the audio signal. By comparing the determined reference signal with the audio signal, a quality value is defined that is then used for determining the measure of quality.
The inventive procedure therefore permits assessment of the quality of an audio signal at any connection of the telecommunications network. This means it therefore also permits quality assessment of many transmission channels simultaneously so that even simultaneous assessment of all channels would be possible. Here, the quality is assessed on the basis of the properties of the received signal, i.e. without knowledge of the source signal or of the signal source.
The invention therefore not only enables monitoring of the transmission quality of the telecommunications network but also, for example, quality-based billing/accounting, quality-based routing in the network, coverage testing in mobile radio networks, quality of service (QOS) control of network nodes or quality comparison within a network as well as globally throughout the network.
In addition to the required signal information, an audio signal transmitted via a telecommunications network characteristically also exhibits undesirable components such as various noise components that did not exist in the original source signal.
The best possible estimate of the originally transmitted signal is necessary in order to be able to assess the quality most effectively. Various methods can be used for the purpose of reconstructing this reference signal. One option involves estimating the characteristics of the transmission channel and calculating backwards starting from the received signal. A further option entails a direct estimate of the reference signal based on the known information relating to the received signal and the transmission channel.
In this particular method, the reference signal is determined by estimating the interference signal components contained in the received signal and then removing them from the received signal. By removing the noise components from the audio signal, initially, a de-noised audio signal is determined that is preferably used as the reference signal for assessing the transmission quality.
There are various methods of removing noise components from the received audio signal. For example, the audio signal could be routed via corresponding filters. In a preferred method for removing the noise components from the audio signal, a neuronal network is used for this purpose.
The audio signal, however, is not used directly as the input signal. Initially, the audio signal is subject to discrete wavelet transformation (DWT). This transformation produces a number of DWT coefficients of the audio signal that are fed to the neuronal network as the input signal. The neuronal network makes available a number of corrected DWT coefficients at its output, from which the reference signal is derived with inverse DWT. This signal corresponds to the de-noised (noise-free) version of the audio signal.
In order to achieve this, the coefficients of the neuronal network must be set in such a way that it produces the DWT coefficients of the corresponding de-noised input signal in response to the DWT coefficients of a noise-laden input signal. To ensure the neuronal network supplies the required coefficients, it must first be taught with a set of corresponding noise-laden and de-noised signal pairs.
In this way, both stationary noise such as white, thermal, vehicle or road noise as well as pulse noise can be suppressed. Also echoes and interference can be suppressed or eliminated with the neuronal network.
In addition to the quality value that is determined by comparing the received audio signal with the established reference signal, any other information can be taken into consideration when determining the measure of quality. This may be both information contained in the audio signal as well as information relating to the transmission channel or the telecommunications network itself.
When determining the measure of quality, it is of advantage to use information that can be derived from the received audio signal itself using suitable means. For instance, the quality of the received audio is influenced by the codecs (coder-decoders) through which the signal passes during transmission. It is difficult to determine such signal degradation as a part of the original signal information is lost if the codec bit rates are too low. On the other hand, low codec bit rates result in a change in the fundamental frequency (pitch) of the audio signal which is why the progression and the dynamics of the fundamental frequency are examined advantageously in the audio signal. Since such changes can be examined easiest on the basis of audio signal sections with vocals, initially, signal components with vocals are detected in the audio signal and then examined for pitch variations.
Let us return to determining the reference signal from the received audio signal. This signal can exhibit not only undesirable signal components but also required information may be lost when under way. Consequently, the received audio signal may exhibit signal interruptions to a greater or lesser extent.
However, the closer the reference signal generated from the audio signal is to the original source signal, the more precise the assessment of the transmission quality. This is the reason for replacing signal interruptions by suitable signals. Suitable noise signals as well as signal sections already transmitted may be used for this purpose.
In order to obtain the most accurate estimate of the reference signal as possible, however, it is of advantage to initially detect such signal interruptions in the audio signal and then to replace the missing signal sections by estimates achieved as accurately as possible by interpolation. In this case, the type of interpolation of the lost signal sections depends on the length of the signal interruption. In the case of short interruptions, i.e. interruptions up to a few sampling values in the audio signal, polynomial interpolation is preferably used and in the case of medium-long interruptions, i.e. from a few to several dozen scanning values, model-based interpolation is preferably used.
Longer signal interruptions, however, i.e. interruptions from several dozen scanning values can be scarcely reconstructed feasibly. Instead of considering this information as superfluous and to dismiss it, this information and, in part, also information relating to the short and medium signal interruptions is taken into consideration in the assessment of the transmission quality. It is used in the calculations for determining the measure of quality.
The received audio signal can comprise various types of audio signals. For instance, it can contain voice, music, noise as well as rest (off state) signal components. The quality can, of course, also be assessed on the basis of all or part of these signal components. In a preferred variant of the invention, however, assessment of the signal quality is confined to the voice signal components. Consequently, the voice signal components are initially extracted from the audio signal using an audio discriminator and only these voice signal components are then used for determining the measure of quality, i.e. for establishing the reference signal. To determine the quality in this case, the determined reference signal is, of course, not compared with the received audio signal but rather only with the voice signal component extracted from it.
The invention-compliant device for machine-assisted determination of a measure of quality of an audio signal comprises first means for determining a reference signal from the audio signal, second means for determining a quality value by comparing the determined reference signal with the audio signal as well as third means for determining the measure of quality while taking the quality value into consideration.
The first means for determining a reference signal from the audio signal can comprise several modules. Therefore, a noise suppression module and/or an interruption detection and interpolation module should preferably be provided.
The noise suppression module is used to suppress noise signal components in the received audio signal. It contains the means for implementing the wavelet transformations as already described as well as the neuronal network for determining the new DWT coefficients. The interruption detection and interpolation module features such means that are required, on the one hand, for detecting signal interruptions in the audio signal and, on the other hand, for polynomial interpolation of short signal interruptions as well as for model-based interpolation of medium-long signal interruptions. The reference signal determined in this way therefore corresponds to a de-noised version of the received audio signal and characteristically exhibits only larger signal interruptions.
The information relating to the signal interruptions of the audio signal, however, is not only used for establishing a better reference signal but it can also be used for determining a better measure of quality. The third means for determining the measure of quality are therefore preferably designed in such a way that information relating to signal interruptions in the audio signal can be taken into consideration.
The more information on the audio signal that is used in determining the measure of quality, the more accurate the quality assessment. The device therefore advantageously features the fourth means for determining information on codec-related signal distortions. These means comprise, for example, a vocal detection module that can be used to detect signal components with vocals in the audio signal. These vocal signal components are routed to an evaluation module which, based on these signal components, determines information on codec-related signal distortions that are also used for the purpose of determining the signal quality. The third means are correspondingly designed in such a way that this information on the codec-related signal distortions can be taken into consideration in determining the measure of quality.
Advantageously however, not the entire audio signal is used for assessing the quality but rather only its voice signal components. Corresponding to the procedure already described, the device therefore features in particular the fifth means for extracting the voice signal components from the audio signal. Correspondingly, the audio signal itself is not used for determining the reference signal but rather only its voice signal component is de-noised and examined with regard to interruptions. Likewise, the audio signal is, of course, not compared with the reference signal but rather only its voice signal component. Consequently, the measure of quality is determined only on the basis of the information in the voice signal component while the information from the remaining system components is not taken into consideration.
Further advantageous variants and feature combinations of the invention arise from the following detailed description and the patent claims in their entirety.