Digital processing of audio signals has become increasingly prevalent and is now part of many practical applications. Indeed, it is currently performed in many everyday consumer electronic devices including for example mobile phones, music distribution and rendering, television, etc.
In order to provide improved, new, or more flexible processing, the used audio processing algorithms tend to become increasingly complex and indeed in many scenarios the signal processing is limited by the available computational resource. An example is speech signal processing for speech communication devices where speech coding together with speech enhancement typically consumes a very substantial part of the computational resource. Therefore there is a general desire to improve the computational efficiency of audio processing algorithms.
In some applications and scenarios, processing of a digital audio signal may advantageously be performed in parallel subbands. As the subbands have a reduced bandwidth, such processing may be performed on decimated subband signals, i.e. the sample frequency can be reduced. For example, an audio signal may be divided into two equal subbands with the subband signals being decimated by a factor of two before being individually processed.
As a specific example, a speech signal may be divided into two separate components corresponding respectively to a lower frequency band and a higher frequency band. The encoding may then be performed individually in each band, i.e. it may be performed by applying individual and separate audio processing to the two subband signals. As another example, an echo cancellation process may be performed individually in different subbands.
Following the processing of the individual subbands, these may be combined again to generate a single full band processed audio signal with the same sampling frequency as for the input signal.
The processing of an audio signal by dividing the signal into subbands and applying a processing individually in the different subbands may provide substantial benefits in many scenarios.
For example, for many processing algorithms the computational resource usage does not scale linearly with the frequency bandwidth or sampling frequency. Indeed, for many processing algorithms, the computational requirement may increase e.g. with the square of frequency bandwidth/sampling frequency.
Another advantage of subband processing is that it may allow the processing to be more closely adapted to the different characteristics of the audio signal. For example, a speech signal has very different properties in the frequency range up to, say, 4 kHz than it has in the frequency range above 4 kHz. Therefore, improved speech coding may often be achieved by the encoding algorithm being targeted at the specific characteristics in the different frequency bands, and thus a different encoding for a subband below 4 kHz than for a subband above 4 kHz may be applied. For example, a different speech model may be used.
Also, by operating in different subbands it may be possible to optimize the computational efficiency by adapting the processing to the different characteristics. For example, reverberation is known to last for much longer for lower frequencies than for higher frequencies. Therefore, a reverberation estimation filter (as e.g. used in an echo canceller) for low frequencies needs to have enough coefficients (for a FIR filter) to provide an impulse response sufficient to model a long reverberation effect. However, by splitting the audio signals into e.g. a low frequency and high frequency band, the long filter need only to be applied to the low frequency band (at a decimated sample rate) whereas a much shorter filter (reflecting the short high frequency reverberation) can be applied in the high frequency band (at the decimated sample rate). The overall computational resource usage can in this way be substantially reduced in comparison to filtering the full bandwidth signal at the full sample rate using the long reverberation filter.
There is currently a trend towards increasing the bandwidth of audio signals (e.g. for speech or music audio) and this tends to result in a substantially increased computational resource usage due to the increased sample rate. The importance of using subband processing may increase for increasing bandwidth of the audio signal and indeed it may in many cases even allow audio processing to be performed which due to resource constraints in the device cannot be performed for a full rate higher bandwidth signals.
For example, the bandwidth of (hands-free) speech communication devices is rapidly increasing. Narrowband (4 kHz bandwidth) and wideband (8 kHz) systems are extensively used, but super wideband (16 kHz) and even full band (24 kHz) systems are entering the market (especially for VoIP applications).
As a specific example, speech enhancement algorithms have to cope with this increase in bandwidth. Using the same speech enhancement algorithm for the whole frequency band poses some challenges. The speech enhancement problems to solve are different for the high and low frequencies. Take for example a super wideband algorithm, with a bandwidth of 16 kHz. The speech signal in the range from 0 to 8 kHz is quite different from the speech signal in the range from 8 to 16 kHz. Vowels with their important first three formants predominantly exist in the lower band whereas some consonants extend significantly beyond 8 kHz. Also the frequency selectivity of the human hearing is much higher at the lower frequencies.
As another example, the acoustics of a room normally changes with frequency, mostly due to an increase of the air absorption with increasing frequency. As a result, the reverberation time at high frequencies will be lower for the higher frequencies. As a consequence de-reverberation is especially important for the lower frequencies. The adaptive filter length for e.g. acoustic echo cancellation can accordingly be shorter for the higher frequencies as reverberation is typically much shorter for higher frequencies.
E.g. for acoustic echo cancellation, extending the bandwidth, and thus the sample frequency, by a factor of two and then applying the same algorithm leads to an increase of the adaptive filter length by a factor of two in order to realize the same echo compensation for the low frequency band.
For super wideband speech echo cancellation, filter lengths of 4096 or more are typically needed. The adaptive filters need good de-correlation properties to allow fast adaptation. In essence, this means that the update term of the adaptive filter has to be de-convolved with the autocorrelation of the input signal (loudspeaker signal). Due to the low levels of speech in the high frequency bands, this autocorrelation has a long support in the time domain and leads to non-perfect decorrelation and thus lower adaptation speeds for the high frequencies.
An attractive solution for such applications is to split the signals into separate frequency bands, by applying a filter bank. In such a filter bank, the signals can be divided into e.g. two (for super wideband) or three (for fullband) subbands which are subsequently downsampled (decimated), and then processed separately. After the separate processing, the resulting processed signals are upsampled and recombined.
As mentioned, the split into distinct bands offers the advantage that each band can be processed independently reflecting the specific characteristics in each band. E.g., the processing of the band from 0 to 8 kHz can be exactly the same as for the wideband case, and for the higher frequencies different processing is possible. In particular, for acoustic echo cancellation an adaptive filter of typically 2048 coefficients for the band from 0 to 8 kHz can still be used, whereas for example 1024 coefficients can be used for the band from 8 to 16 kHz. This can be compared to a single band solution typically employing 4096 coefficients for super wideband or even 6144 coefficients for full band.
However, a problem with such subband processing is that the audio signal in principle should be divided into subbands using ideal filters (i.e. non-overlapping filters with infinitely sharp transitions). As this is not possible, some overlap between filters typically results leading to some signal frequencies of the original signal being present in two neighbouring subband signals.
A particular problem of the non-ideal filtering is that aliasing may typically occur as part of the decimation. The decimated frequency is preferably as low as possible, and typically it is set to correspond to the original sampling frequency divided by the number of subbands. However, when using non-ideal filters in such situations, some aliasing of frequency components for one subband into another subband is unavoidable.
However, this aliasing can be addressed by the synthesis unit (generating the output digital audio signal from the processed subband signals) including complementary filters that cancel out the alias components. Therefore, the aliasing resulting from non-ideal filters is conventionally addressed by choosing filters in the synthesis functionality for generating the full bandwidth signal which results in the aliasing being cancelled.
However, despite such compensation, the inventors have realized that it is still a desire to provide improved audio signal processing. In particular, an improved audio signal processing would be advantageous and in particular audio processing allowing increased flexibility, reduced complexity, reduced computational resource use, and/or improved performance would be advantageous.