Current speech coders are being designed for ever increasing bandwidths. Extension of the range supported by a speech coder into higher frequencies may improve intelligibility. For example, the information that differentiates fricatives such as ‘s’ and ‘f’ is largely in the high frequencies. Highband extension may also improve other qualities of speech, such as presence. For example, even a voiced vowel may have spectral energy far above the PSTN limit.
One approach to wideband speech coding involves scaling a narrowband speech coding technique to cover the wideband spectrum. For example, a speech signal may be sampled at a higher rate to include components at high frequencies, and a narrowband coding technique may be reconfigured to use more filter coefficients to represent this wideband signal. Narrowband coding techniques such as CELP (codebook excited linear prediction) are computationally intensive, however, and a wideband CELP coder may consume too many processing cycles to be practical for many mobile and other embedded applications. Encoding the entire spectrum of a wideband signal to a desired quality using such a technique may also lead to an unacceptably large increase in bandwidth. Moreover, transcoding of such an encoded signal would be required before even its narrowband portion could be transmitted into and/or decoded by a system that only supports narrowband coding.
In order to address this issue it has been proposed to have the encoder divide a wideband speech signal into a lowband signal, or narrowband signal, and a highband signal, then encode each signal separately. Such an encoder is described in United States Patent Application Publication 2008/0126086, entitled SYSTEMS, METHODS, AND APPARATUS FOR GAIN CODING, and incorporated by reference herein.
FIG. 1 shows a block diagram of a prior art wideband speech encoder 100. Filter bank 101 is configured to filter a wideband speech signal to produce a lowband signal at a lower bandwidth and a highband signal. Narrowband encoder 102 is configured to encode the lowband signal to produce narrowband filter parameters and a narrowband residual signal. Narrowband encoder 102 is typically configured to produce narrowband filter parameters and an encoded narrowband excitation signal as codebook indices or in another quantized form. Highband encoder 103 is configured to encode the highband signal according to information in the encoded narrowband excitation signal to produce highband coding parameters. Highband encoder 103 is typically configured to produce highband coding parameters as codebook indices or in another quantized form. One particular example of wideband speech encoder 100 is configured to encode wideband speech signal at a rate of about 8.55 kbps (kilobits per second), with about 7.55 kbps being used for narrowband filter parameters and encoded narrowband excitation signal, and about 1 kbps being used for highband coding parameters.
In a typical implementation, filter bank 101 comprises a low pass filter and a high pass filter. FIG. 2 and FIG. 3 show relative bandwidths of a wideband speech signal, lowband signal, and a highband signal in two different implementation examples. In both of these particular examples, the wideband speech signal has a sampling rate of 32 kHz (representing frequency components within the range of 0 to 16 kHz), and the lowband signal has a sampling rate of 16 kHz (representing frequency components within the range of 0 to 8 kHz).
In the example of FIG. 2, there is no significant overlap between the two sub bands. A highband signal as shown in this example may be obtained using a high pass filter with a passband of 8-16 kHz. In such a case, it may be desirable to reduce the sampling rate to 16 kHz by downsampling the filtered signal by a factor of two. Such an operation, which may be expected to significantly reduce the computational complexity of further processing operations on the signal, involves moving the passband energy down to the range of 0 to 8 kHz to prevent loss of information.
In the alternative example of FIG. 3, the upper and lower sub-bands have an appreciable overlap, such that the region of 7 to 8 kHz is described by both subband signals. Such an overlap may be expected to account for non-ideal filtering during the recombination of the upper and lower sub-bands after decoding of the lowband and highband parameters.
Considering an implementation according to FIG. 2 with a sampling rate of 32 kHz and in the case of a super wideband signal (50 Hz-14.0 kHz) with a 12.8 kHz sampled lowband component representing a signal from 0 to 6.4 kHz, a critically sampled 8 kHz bandwidth signal would be suitable to reproduce the highband component.
FIG. 4 shows a block diagram of a prior-art implementation of filter bank 101 that performs a functional equivalent of highpass filtering and downsampling operations using a series of interpolation, resampling, decimation, and other operations. In FIG. 4, lowpass filter 401 and downsampler 402 serve to generate the lowband speech signal, while interpolator 403, resampler 404, decimater 405, spectral reversal circuitry 406, decimator 407, and spectral shaping circuitry 408 server to generate highband speech signals.
Such an implementation may be easier to design and/or may allow reuse of functional blocks of logic and/or code. For example, the same functional block may be used to perform the operations of decimation by ⅖ to 12.8 kHz (402) and decimation by 5/11 to 16 kHz (407) as shown in FIG. 4. The spectral reversal operation may be implemented by multiplying the signal with the function ejnπ or the sequence (−1)n, whose values alternate between −1 and −1. The spectral shaping operation may be implemented as a lowpass filter configured to shape the signal to obtain a desired overall filter response.
It is noted that as a consequence of the spectral reversal operation, the spectrum of highband signal is reversed. Subsequent operations in the encoder and corresponding decoder may be configured accordingly. For example, highband excitation generator as described herein may be configured to produce a highband excitation signal that also has a spectrally reversed form.
It will be observed that the highest sample rate in the above implementation is 64 kHz and the number of processing steps required to obtain a critically sampled version of the highband speech signal is six, indicating a relatively high degree of complexity before encoding may commence. Furthermore the flexibility of this approach is limited because of the need to achieve a critically sampled version of the highband speech signal, i.e. a sample rate which corresponds to precisely twice the upper frequency of the band to be coded. In this case the required sampling rate is 28.8 kHz to code the highband with an upper frequency of 14.4 kHz. Therefore a need exists for a method and apparatus for encoding signals that reduces the complexity with the above described encoder and enhances flexibility to code different highband configurations.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present invention. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. Those skilled in the art will further recognize that references to specific implementation embodiments such as “circuitry” may equally be accomplished via either on general purpose computing apparatus (e.g., CPU) or specialized processing apparatus (e.g., DSP) executing software instructions stored in non-transitory computer-readable memory. It will also be understood that the terms and expressions used herein have the ordinary technical meaning as is accorded to such terms and expressions by persons skilled in the technical field as set forth above except where different specific meanings have otherwise been set forth herein.