Conventional speech/audio coding is performed by a core codec. A codec implies an encoder and a decoder. The core codec is adapted to encode/decode a core band of the signal frequency band, whereby the core band includes the essential frequencies of a signal up to a cut-off frequency, which, for instance, is 3400 Hz in case of narrowband speech. The core codec can be combined with bandwidth extension (BWE), which handles the high frequencies above the core band and beyond the cut-off frequency. BWE refers to a kind of method that increases the frequency spectrum (bandwidth) at the receiver over that of the core bandwidth. The gain with BWE is that it usually can be done with no or very little extra bit rate in addition to the core codec bit rate. The frequency point marking the border between the core band and the high frequencies handled by bandwidth extension is in this specification referred to as the cross-over frequency, or the cut-off frequency.
Overclocking is a method, available e.g. in the Adaptive MultiRate-WideBand+(AMR-WB+)—audio codec in 3GPP TS 26.290 Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding functions), allowing to operate the codec at a modified internal sampling frequency, even though it was originally designed for a fixed internal sampling frequency of 25.6 kHz. Changing the internal sampling frequency allows for scaling the bit rate, bandwidth and complexity with the overclocking factor, as explained below. This allows for operating the codec in a very flexible manner depending on the requirements on bit rate, bandwidth and complexity. E.g. if very low bit rate is needed, a low overclocking factor (=underclocking) can be used, which at the same time means that the encoded audio bandwidth and complexity is reduced. On the other hand, if very high quality encoding is desired, a high overclocking factor is used allowing to encode a large audio bandwidth at the expense of increased bit rate and complexity.
Overclocking in the encoder side is realized by using a flexible resampler in the encoder frontend, which converts the original audio sampling rate of the input signal (e.g. 44.1 kHz) to an arbitrary internal sampling frequency, which deviates from the nominal internal sampling frequency by an overclocking factor. The actual coding algorithm always operates on a fixed signal frame (containing a pre-defined number of samples) sampled at the internal sampling frequency; hence it is in principle unaware of any overclocking. However, various codec attributes are scaled by a given overclocking factor, such as bit rate, complexity, bandwidth, and cross-over frequency.
It would be desired to use of the above mentioned overclocking method in order to achieve an increased coding efficiency. This would lead to improved signal quality at the same bit rate or lower bit rate while maintaining the same quality level.
The U.S. Pat. No. 7,050,972 describes a method for an audio coding system that adaptively over time adjusts the cross-over frequency between a core codec for coding a lower frequency band and a high frequency regeneration system, also referred to bandwidth extension in this specification, of a higher frequency band. It is further described that the adaptation can be made in response to the capability of the core codec to properly encode the low frequency band.
However U.S. Pat. No. 7,050,972 does not provide means for improving the coding efficiency of the core codec, namely operating it at a lower sampling frequency. The method merely aims for improving the efficiency of the total coding system by adapting the bandwidth to be encoded by the core codec such that it is ensured that the core codec can properly encode its band. Hence, the purpose is achieving an optimum performance trade-off between core and bandwidth extension band rather than making any attempt which would render the core codec more efficient.
Patent application (WO-2005096508) describes another method comprising a band extending module, a re-sampling module and a core codec comprising psychological acoustic analyzing module, time-frequency mapping module, quantizing module, entropy coding module. The band extending module analyzes the original inputted audio signals in whole bandwidth, extracts the spectral envelope of the high-frequency part and the parameters charactering the dependency between the lower and higher parts of the spectrum. The re-sampling module re-samples the inputted audio signals, changes the sampling rate, and outputs them to the core codec.
However, patent application (WO-2005096508) does not contain provisions which would allow for adapting the operation of the re-sampling module in dependence of some analysis of the input signal. Also, no adaptive segmentation means of the original input signal are foreseen, which would allow to map an input segment after an adaptive re-sampling onto an input frame of a subsequent core code, the input frame containing a pre-defined number of samples. The consequence of this is that it cannot be ensured that the core codec operates on the lowest possible signal sampling rate and hence, the efficiency of the overall coding system is not as high as would be desirable.
The publication C. Shahabi et al.: A comparison of different haptic compression techniques; ICME 2002 describes an adaptive sampling system for haptic data operating on data frames, which periodically identifies the Nyquist frequency for the data window and subsequently resamples the data at this frequency. The sampling frequency is for practical reasons chosen according to a cut-off frequency, beyond which the signal energy can be neglected.
The problem with the solution described in the above mentioned publication C. Shahabi et al. is that it provides no gain in the context of speech and audio coding. For sampling of haptic data a criterion related to the relative energy content beyond the cut-off frequency (e.g. 1%) may be appropriate, which aims to retain an accurate representation of the data at a lowest possible sampling rate. However, in the context of speech and audio coding, usually there are fixed constraints on the input or output sampling frequency implying that the original signal is first lowpass filtered with a fixed cut-off frequency and subsequently downsampled to the required sampling rate of e.g. 8, 16, 32, 44.1, or 48 kHz. Hence, the bandwidth of the speech or audio signal is already artificially limited to a fixed cut-off frequency. A subsequent adaptation of the sampling frequency according to the method of this publication would generally not work as it would only lead to a fixed rather than an adaptive sampling frequency as a consequence of the artificially fixed cut-off frequency.
However, even in the case where the bandwidth is artificially limited, depending on the local (in time) perception properties of the audio signal, the impact of the fixed bandwidth limitation is not always perceived the same. For certain parts (segments) of the signal, in which high frequencies are hardly perceivable, e.g. due to masking by dominant low frequency content, a more aggressive low pass filtering and sampling with a correspondingly lower sampling frequency would be possible. Hence, conventional speech and audio coding systems operate on a locally too high sampling frequency than perceptually motivated and thus compromise coding efficiency.