The present invention relates generally to the improvement of the perceived sound quality of decoded acoustic signals. More particularly the invention relates to a method of producing a wide-band acoustic signal on basis of a narrow-band acoustic signal according to the preamble of claim 1 and a signal decoder according to the preamble of claim 24. The invention also relates to a computer program according to claim 22 and a computer readable medium according to claim 23.
Today's public switched telephony networks (PSTNs) generally low-pass filter any speech or other acoustic signal that they transport. The low-pass (or, in fact, band-pass) filtering characteristic is caused by the networks' limited channel bandwidth, which typically has a range from 0,3 kHz to 3.4 kHz. Such band-pass filtered acoustic signal is normally perceived by a human listener to have a relatively poor sound quality. For instance, a reconstructed voice signal is often reported to sound muffled and/or remote from the listener.
The trend in fixed and mobile telephony as well as in video-conferencing is, however, towards an improved quality of the acoustic source signal that is reconstructed at the receiver end. This trend reflects the customer expectation that said systems provide a sound quality, which is much closer to the acoustic source signal than what today's PSTNs can offer.
One way to meet this expectation is, of course, to broaden the frequency band for the acoustic source signal and thus convey more of the information being contained in the source signal to the receiver. For instance, if a 0-8 kHz acoustic signal (sampled at 16 kHz) were transmitted to the receiver, the naturalness of a human voice signal, which is otherwise lost in a standard phone call, would indeed be better preserved. However, increasing the bandwidth for each channel by more than a factor two would either reduce the transmission capacity to less than half or imply enormous costs for the network operators in order to expand the transmission resources by a corresponding factor. Hence, this solution is not attractive from a commercial point-of-view.
Instead, recovering at the receiver end, wide-band frequency components outside the bandwidth of a regular PSTN-channel based on the narrow-band signal that has passed through the PSTN constitutes a much more appealing alternative. The recovered wide-band frequency components may both lie in a low-band below the narrow-band (e.g. in a range 0.1-0.3 kHz) and in a high-band above the narrow-band (e.g. in a range 3.4-8.0 kHz).
Although the majority of the energy in a speech signal is spectrally located between 0 kHz and 4 kHz, a substantial amount of the energy is also distributed in the frequency band from 4 kHz to 8 kHz. The frequency resolution of the human hearing decreases rapidly with increasing frequencies. The frequency components between 4 kHz and 8 kHz therefore require comparatively small amounts of data to model with a sufficient accuracy.
It is possible to extend the bandwidth of the narrow-band acoustic signal with a perceptually satisfying result, since the signal is presumed to be generated by a physical source, for instance, a human speaker. Thus, given a particular shape of the narrow-band, there are constraints on the signal properties with respect to the wide-band shape. I.e. only certain combinations of narrow-band shapes and wide-band shapes are conceivable.
However, modelling a wide-band signal from a particular narrow-band signal is still far from trivial. The existing methods for extending the bandwidth of the acoustic signal with a high-band above the current narrow-band spectrum basically include two different components, namely: estimation of the high-band spectral envelope from information pertaining to the narrow-band, and recovery of an excitation for the high-band from a narrow-band excitation.
All the known methods, in one way or another, model dependencies between the high-band envelope and various features describing the narrow-band signal. For instance, a Gaussian mixture model (GMM), a hidden Markov model (HMM) or vector quantisation (VQ) may be utilised for accomplishing this modelling. A minimum mean square error (MMSE) estimate is then obtained from the chosen model of dependencies for the high-band spectral envelope provided the features that have been derived from the narrow-band signal. Typically, the features include a spectral envelope, a spectral temporal variation and a degree of voicing.
The narrow-band excitation is used for recovering a corresponding high-band excitation. This can be carried out by simply up-sampling the narrow-band excitation, without any following low-pass filtering. This, in turn, creates a spectral-folded version of the narrow-band excitation around the upper bandwidth limit for the original excitation. Alternatively, the recovery of the high-band excitation may involve techniques that are otherwise used in speech coding, such as multi-band excitation (MBE). The latter makes use of the fundamental frequency and the degree of voicing when modelling an excitation.
Irrespective of how the high-band excitation is derived, the estimated high-band spectral envelope is used for obtaining a desired shape of the recovered high-band excitation. The result thereof in turn forms a basis for an estimate of the high-band acoustic signal. This signal is subsequently high-pass filtered and added to an up-sampled and low-pass filtered version of the narrow-band acoustic signal to form a wide-band acoustic signal estimate.
Normally, the bandwidth extension scheme operates on a 20-ms frame-by-frame basis, with a certain degree of overlap between adjacent frames. The overlap is intended to reduce any undesired transition effects between consecutive frames.
Unfortunately, the above-described methods all have one undesired characteristic in common, namely that they introduce artefacts in the extended wide-band acoustic signals. Furthermore, it is not unusual that these artefacts are so annoying and deteriorate the perceived sound quality to such extent that a human listener generally prefers the original narrow-band acoustic signal to the thus extended wide-band acoustic signal.