Audio signals, such as speech or music, may be encoded for enabling an efficient transmission or storage of the audio signals.
Audio signals may be limited to a bandwidth which is typically determined by the available capacity of the transmission system or storage medium. However, in some instances it may be desirable to perceive the decoded audio signal at a higher bandwidth than the bandwidth at which the audio signal was originally encoded. In these instances artificial bandwidth extension may be deployed at the decoder, whereby the bandwidth of the decoded audio signal may be extended by using information solely determined from the decoded audio signal itself.
One such example of the application of artificial bandwidth extension may lie in the area of mobile telecommunications. Typically in a mobile communication system such as the Global System for Mobile Communications (GSM), the speech signal may be limited to a bandwidth of less than 4 kHz, in other words a narrow band speech signal. However, naturally occurring speech may contain significant frequency components up to 10 kHz. The additional higher frequencies may contribute to the overall quality and intelligibility of the speech signal resulting crisper and brighter sound when compared to the equivalent narrowband signal.
Existing methods for improving the quality and intelligibility of narrowband speech by artificial bandwidth extension may deploy a codebook to generate the additional high frequency components. The codebook may comprise frequency vectors of different spectral characteristics, all of which cover the range of frequencies of interest. The frequency range may be extended, on a frame by frame basis, by selecting the optimal vector and adding to it spectral components from the received decoded signal.
Additionally artificial bandwidth extension methods may deploy the technique of up sampling in order to create alias copies of the received signal at the higher frequency components. The magnitude or energy levels of the aliased frequency components may then be adjusted in order to create the representative higher frequencies of the speech signal.
However, existing methods of artificial bandwidth extension can suffer from poor quality and inefficiency.
For example, some methods of artificial bandwidth extension can adopt a system classifying the incoming speech frames by their phonetic content in order to determine an upper band envelope. The envelope can then be used to shape the frequency spectrum created by the aliasing of the lower frequencies.
However, upper bands which are generated using this approach can not always sound natural. This may partly be attributed to the fact that transitions between different phonemes are naturally smooth in a speech signal. Whereas using a system of classifying the phonemes may have the consequence of introducing discontinuities at decision boundaries.
Other factors can also contribute to an unnatural sound using the above artificial bandwidth extension approach, such as incorrect classification of the incoming speech frames and inaccurate estimation of the high band spectral shape.