There is considerable interest among those in the fields of audio- and video-signal processing to minimize the amount of information required to represent a signal without perceptible loss in signal quality. By reducing information requirements, signals impose lower information capacity requirements upon communication channels and storage media; however, there are limits to the reduction in information requirements which can be realized without degrading the perceived signal quality.
Digital signals encoded with fewer bits impose lower information capacity requirements, but decreasing the number of bits used to quantize information increases the quantizing inaccuracies or quantizing errors. In many applications, quantizing errors are manifested as quantizing noise. If the errors are large enough, the noise will be perceptible and degrade the perceived quality of the encoded signal.
Various "split-band" coding techniques attempt to reduce information requirements without producing any perceptible degradation by exploiting various psycho-perceptual effects. In audio applications, for example, the human auditory system displays frequency-analysis properties resembling those of highly asymmetrical tuned filters having variable center frequencies and bandwidths which vary as a function of the center frequency. The ability of the human auditory system to detect distinct tones generally increases as the difference in frequency between the tones increases; however, the resolving ability of the human auditory system remains substantially constant for frequency differences less than the bandwidth of the above mentioned filters. As a result, the frequency-resolving ability of the human auditory system varies according to the bandwidth of these filters throughout the audio spectrum. Frequency bands having a bandwidth commensurate with the bandwidths of these auditory filters are referred to as "critical bands" and the widths of these bands are referred to as "critical bandwidths." A dominant signal within a critical band is more likely to mask the audibility of other signals anywhere within that critical band than it is likely to mask other signals at frequencies outside that critical band.
A dominant signal may mask other signals which occur not only at the same time as the masking signal, but also which occur before and after the masking signal. The duration of pre- and postmasking effects within a critical band depend upon the magnitude of the masking signal, but premasking effects are usually of much shorter duration than postmasking effects. See generally, the Audio Engineering Handbook, K. Blair Benson ed., McGraw-Hill, San Francisco, 1988, pages 1.40-1.42 and 4.8-4.10.
In audio applications, for example, split-band coding techniques which divide the useful signal bandwidth into frequency bands with bandwidths approximating the critical bandwidths of the human auditory system can better exploit psychoacoustic effects than wider band techniques. Such digital split-band coding techniques comprise dividing an input signal into "subbands," quantizing the signal passed by each subband filter using just enough bits to render quantizing noise inaudible, and reconstructing a replica of the original signal. Two such techniques are subband coding and transform coding. Without degrading the subjective quality of the encoded signal, subband and transform coding can reduce transmitted information requirements in particular frequency subbands where the resulting quantizing noise is psychoacoustically masked by neighboring spectral components.
Subband coders may incorporate a filter bank implemented by any of a variety of techniques including Finite Impulse Response (FIR) filters, Infinite Impulse Response (IIR) filters, and discrete transforms. In such coders, an input signal comprising signal samples is passed through a bank of digital bandpass filters and each "subband signal" passed by a respective bandpass filter in the filter bank is downsampled according to the bandwidth of that subband's filter. Each subband signal comprises samples which represent a portion of the input signal spectrum.
Transform coders may implement a bank of digital filters with so-called time-domain to frequency-domain transforms such as the Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and Discrete Hadamard Transform (DHT). In such coders, an input signal comprising signal samples is segmented into "signal sample blocks" prior to transformation. Each coefficient obtained from the transform represents a portion of the input signal spectrum for a respective signal sample block. Individual coefficients, or two or more adjacent coefficients grouped together, define "subbands" having effective bandwidths which are sums of individual coefficient bandwidths.
Throughout the following discussion, the term "split-band coder" shall refer to subband coders, transform coders, and other coding techniques which operate upon portions of the useful signal bandwidth. The term "subband" shall refer to these portions of the useful signal bandwidth, whether implemented by a true subband coder, a transform coder, or other split-band technique.
The term "signal sample block" shall refer to a group or block of signal samples within a given interval of time. The term pertains to transform coders having block transforms which operate upon blocks of signal samples, and it also pertains to other split-band coders such as subband coders which segment samples into blocks to facilitate various block coding methods.
The term "subband information" shall refer to the split-band filtered representation of the spectral energy in one or more subbands. The term "subband information block" shall refer to the subband information for all subbands across the useful signal bandwidth for a given signal sample block. For subband coders implemented by a digital filter bank, a subband information block comprises the set of samples for all subband signals over a given time interval. For transform coders, a subband information block comprises the set of all transform coefficients corresponding to a signal sample block.
For ease of discussion, more particular mention is made of audio coding throughout this disclosure but the principles, problems and solutions apply generally to other coding applications such as video coding.
In concept, many split-band audio coders utilizing psychoacoustic masking effects provide high-quality coding at low bit rates by applying a filter bank to an input signal to generate subband information, quantizing each element of subband information using a number of bits allocated to that element such that resulting quantizing noise is inaudible due to various psychoacoustic masking effects, and assembling the quantized information into a form suitable for transmission or storage.
A complementary split-band decoder recovers a replica of the original input signal by extracting quantized information from an encoded signal, dequantizing the quantized information to obtain subband information, and applying an inverse filter bank to the subband information to generate the replica of the original input signal.
The ability of a split-band coding system to exploit psychoacoustic masking effects depends upon the selectivity of bandpass filters in the filter banks implemented in the encoder and decoder. Filter "selectivity," as that term is used here, refers to two characteristics of subband bandpass filters. The first is the bandwidth of the regions between the filter passband and stopbands (the width of the transition bands). The second is the attenuation level in the stopbands. Thus, filter selectivity refers to the steepness of the filter response curve within the transition bands (steepness of transition band rolloff), and the level of attenuation in the stopbands (depth of stopband rejection).
Filter selectivity is directly affected by numerous factors including filter temporal resolution. In a general sense, filter selectivity or frequency resolution increases as filter temporal resolution decreases. The temporal resolution of an IIR filter is inversely related to the filter's time constant. The temporal resolution of FIR filters and discrete transforms is inversely related to the filter and transform length. The length of an FIR filter is determined by the number of filter taps or filter coefficients. The length of a transform-based filter is defined herein to be the "signal sample block length" or the number of samples in a block of samples which are transformed together into a subband information block. With other factors constant, as filter length increases, temporal resolution decreases and frequency selectivity increases.
It is common for the number of coefficients generated by a transform filter bank, or the "transform length," to be equal to the signal sample block length, but this is not necessary. For example, the overlapping-block transform used in one embodiment of the present invention discussed more fully below is sometimes described in the art as a transform of length N that transforms signal sample blocks with 2N samples. But this transform can also be described as a transform of length 2N which generates only N unique coefficients. Because all the transforms discussed herein can be thought to have lengths equal to the signal sample block length, the two lengths are used as synonyms for one another.
On the one hand, it is desirable for encoders to use filter banks with longer filters because higher frequency selectivity reduces the amount of energy which leaks from one bandpass filter band into another. By reducing leakage within the filter bank, encoders can more accurately measure the spectral shape of an input signal, can make more accurate bit allocation decisions, and can more reliably render quantization inaudible within the constraints of a given bit rate.
On the other hand, it desirable for encoders to use shorter filters because higher temporal resolution decreases the time interval over which quantization errors are spread. For example, quantization errors will cause a transform encoder/decoder system to "smear" the frequency components of a sampled signal across the full length of the signal sample block. Distortion artifacts in the signal recovered by the decoder may be audible for large changes in signal amplitude which occur during a short time interval. Such amplitude changes are referred to here as "transients." These artifacts, which can occur in both transform and true subband coding systems, manifest themselves as pre- and post-transient ringing. If filter temporal resolution is sufficiently high, such artifacts can be confined to the pre-masking and post-masking intervals of the transient, thereby increasing the likelihood that they will be masked by the transient itself.
Coding systems which used fixed-length filters must use a compromise length which trades off a priori temporal resolution against frequency resolution. A short length will degrade subband filter selectivity. A long length may improve filter selectivity but will reduce temporal resolution, which may result in coding artifacts which are audible because they occur outside the temporal psychoacoustic masking interval of the human auditory system.
A transform coding method disclosed in European Patent Office publication EP 0 251 028 attempts to eliminate pre-transient artifacts by effectively eliminating the transient. The encoding method high-pass filters an input signal to improve transient detection, boosts the amplitude of signal samples in a signal sample block prior to a transient, applies a transform to the modified signal sample block and quantizes the resulting transform coefficients. The position of the transient is passed as side information to the receiver/decoder which applies an inverse transform to the received transform coefficients and attenuates recovered signal samples in a signal sample block prior to a transient by a corresponding amount.
This coding method has several disadvantages, two of which are mentioned here. First, the pre-transient boost distorts the spectral shape of the sample block and thereby distorts coding decisions based on this spectral shape. This adversely affects the ability to exploit psychoacoustic masking. Furthermore, in coders adaptively allocating a limited number of bits, the boost of pre-transient signal samples tends to increase quantizing errors of the transient. This increase in quantizing error results from the boost amplifying spectral components other than those of the transient. Adaptive bit allocation based upon psychoacoustic principles will allocate more bits to these amplified spectral components than would otherwise be allocated without boost. This reduces the number of bits remaining to encode the transient's spectral components; therefore, transient quantizing noise may increase.
Second, large-amplitude signal samples that are amplified by the pre-transient boost may exceed the encoder's capacity to represent them (exceed the encoder's dynamic range). If the encoder's dynamic range is increased to handle the amplified components, the number of bits required to encode the signal also increases. This condition is more likely for large-amplitude low-frequency spectral components. Because they are low in frequency, these large-amplitude components will be blocked by the high-pass filter and will not inform the transient detection process. The EP 0 251 028 publication suggests applying a frequency selective boost, boosting only those spectral components which make up the transient; however, this requires more processing to perform one or more additional filtering steps over that needed to perform the transform because the pre-transient boost occurs in the time-domain prior to transform filtering.
Another transform coding method, described in WIPO publication WO 91/16769, reduces the signal sample block length and adapts the transform function in response input signal transients. Although this method avoids the problems recited above, it requires a significant amount of processing. As a result, the cost of implementation may be too high for many applications.