In accordance with many conventional audio encoding methods, audio data undergoes quantization (e.g., to compress the audio data during perceptual audio coding). For example, encoding of audio data in accordance with the formats known as AC-3 and Enhanced AC-3 (or “E-AC-3”) includes such a quantization step. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3 known as Dolby Digital and Dolby Digital Plus, respectively. Dolby, Dolby Digital, and Dolby Digital Plus are trademarks of Dolby Laboratories Licensing Corporation.
Although some embodiments of the present invention are useful to filter audio content of a decoded version of an encoded bitstream having AC-3 (or E-AC-3) format, it is contemplated that other embodiments of the invention are useful to filter audio content of decoded versions of encoded bitstreams having other formats (provided that the encoding includes a quantization step).
Next, with reference to FIG. 1, we describe aspects of conventional AC-3 encoding of audio data, as an example of an encoding method which includes mantissa bit allocation and mantissa value quantization steps.
An encoded bitstream having AC-3 format comprises one to six channels of audio content, and metadata indicative of at least one characteristic of the audio content. The audio content is audio data that has been compressed using perceptual audio coding.
In encoding of an AC-3 audio bitstream, blocks of input audio samples to be encoded undergo time-to-frequency domain transformation resulting in blocks of frequency domain data, commonly referred to as transform coefficients, frequency coefficients, or frequency components, located in uniformly spaced frequency bins. The frequency coefficient in each bin is then converted (e.g., in BFPE stage 7 of the FIG. 1 system) into a floating point format comprising an exponent and a mantissa.
Typical embodiments of AC-3 (and E-AC-3) encoders (and other audio data encoders) implement a psychoacoustic model to analyze the frequency domain data on a banded basis (i.e., typically 50 nonuniform bands approximating the frequency bands of the well known psychoacoustic scale known as the Bark scale) to determine an optimal allocation of bits to each mantissa. The mantissa data is then quantized (e.g., in quantizer 6 of the FIG. 1 system) to a number of bits corresponding to the determined bit allocation. The quantized mantissa data is then formatted (e.g., in formatter 8 of the FIG. 1 system) into an encoded output bitstream. The mantissa bit assignment is based on the difference between a fine-grain signal spectrum (represented by a power spectral density (“PSD”) value for each frequency bin) and a coarse-grain masking curve (represented by a mask value for each frequency band determined by the psychoacoustic model).
To perform AC-3 encoding of an audio program, a number, N (e.g., N=1, N=2, or N=4), of quantized mantissa values (one for each of N consecutive frequency bins) which will share the same exponent value is chosen. Each such set of N consecutive frequency bins may also (and herein will) be referred to as a frequency “band” (each band comprising N bins). Thus, one bit allocation value for each frequency band of an encoded audio program (where the bit allocation value is indicative of the number of bits of the mantissa for one bin of the band) suffices to indicate the number of bits of each mantissa of each audio sample in the band. In this context, the frequency bands of the encoded audio program are typically not the same frequency bands assumed by the psychoacoustic model which is employed to determine the number of bits of each quantized mantissa of the encoded program.
FIG. 1 is an encoder configured to perform AC-3 (or Enhanced AC-3) encoding on time-domain input audio data 1. Analysis filter bank 2 converts the time-domain input audio data 1 into frequency domain audio data 3 (samples in a set of frequency bins), and block floating point encoding (BFPE) stage 7 generates a floating point representation of each frequency component of data 3, comprising an exponent and mantissa for each frequency bin. The frequency-domain data output from stage 7 will sometimes also be referred to herein as frequency domain audio data 3. The frequency domain audio data output from stage 7 are then encoded, including by quantization of its mantissas in quantizer 6, and tenting of its exponents (in tenting stage 10) and encoding (in exponent coding stage 11) of the tented exponents generated in stage 10. Formatter 8 generates an AC-3 (or enhanced AC-3) encoded bitstream 9 in response to the quantized data output from quantizer 6 and coded differential exponent data output from stage 11.
Quantizer 6 performs bit allocation and quantization based upon control data (including masking data) generated by controller 4. The masking data (determining a masking curve) is generated from the frequency domain data 3, on the basis of a psychoacoustic model (implemented by controller 4) of human hearing and aural perception. The psychoacoustic modeling takes into account the frequency-dependent thresholds of human hearing, and a psychoacoustic phenomenon referred to as masking, whereby a strong frequency component close to one or more weaker frequency components tends to mask the weaker components, rendering them inaudible to a human listener. This makes it possible to omit the weaker frequency components when encoding audio data, and thereby achieve a higher degree of compression, without adversely affecting the perceived quality of the encoded audio data (bitstream 9). The masking data comprises a masking curve value for each frequency band (determined by the psychoacoustic model) of the frequency domain audio data 3. These masking curve values represent the level of signal masked by the human ear in each frequency band. Quantizer 6 uses this information to decide how best to use the available number of data bits to represent the frequency domain data of each frequency band of the input audio signal.
Controller 4 may implement a conventional low frequency compensation process (sometimes referred to herein as “lowcomp” compensation) to generate lowcomp parameter values for correcting the masking curve values for the low frequency bands. The corrected masking curve values are used to generate the signal-to-mask ratio value for each frequency component of the frequency-domain audio data 3. Low frequency compensation is a feature of the psychoacoustic model typically implemented during AC-3 (and E-AC-3) encoding of audio data. Lowcomp compensation improves the encoding of highly tonal low-frequency components (of the input audio data to be encoded) by preferentially reducing the mask in the relevant frequency region, and in consequence allocating more bits to the code words employed to encode such components.
In AC-3 and E-AC-3 encoding, each component of the frequency-domain audio data 3 (i.e., the contents of each transform bin) has a floating point representation comprising a mantissa and an exponent. To simplify the calculation of the masking curve, the Dolby Digital family of coders uses only the exponents to derive the masking curve. Or, stated alternately, the masking curve depends on the transform coefficient exponent values but is independent of the transform coefficient mantissa values. Because the range of exponents is rather limited (generally, integer values from 0-24), the exponent values are mapped onto a PSD scale with a larger range (generally, integer values from 0-3072) for the purposes of computing the masking curve. Thus, the loudest frequency components are mapped to a PSD value of 3072, while the softest frequency-domain data components are mapped to a PSD value of 0.
In conventional Dolby Digital (or Dolby Digital Plus) encoding, differential exponents (i.e., the difference between consecutive exponents) are coded instead of absolute exponents. The differential exponents can only take on one of five values: 2, 1, 0, −1, and −2. If a differential exponent outside this range is found, one of the exponents being subtracted is modified so that the differential exponent (after the modification) is within the noted range (this conventional method is known as “exponent tenting” or “tenting”). Tenting stage 10 of the FIG. 1 encoder generates tented exponents in response to the raw exponents asserted thereto, by performing such a tenting operation.
Spectral domain coding systems (e.g., conventional encoders of the type described with reference to FIG. 1) code pseudo-stationary audio signals extremely well. However, at low data rates these systems can introduce audible pre-echo artifacts when coding transient signals. Conventional coding methods such as Temporal Noise Shaping (TNS) and Gain Control provide improvements for the coding of transient material by temporally flattening the audio signal prior to quantization (and performance of other encoding steps) and then reapplying the original temporal envelope at the decoder. Thus, the noise introduced by quantization is shifted away from quiet segments of the audio to louder segments of the audio in the time domain. The temporal flattening is performed by applying a filter in the encoder, and the inverse of this filter is then applied in the decoder (after delivery of the encoded signal to the decoder).
Typically, the encoder applies the filter in the frequency domain (i.e., to frequency components generated by applying a time domain-to-frequency domain transform on the audio data to be encoded), and the inverse filter is also applied (by the decoder) in the frequency domain (i.e., during or after decoding of frequency-domain encoded audio data, but before application of a frequency domain-to-time domain transform on the decoded audio data.
Herein, we use the term “quantization noise filter” to denote a filter designed to reduce audible noise (e.g., pre-echo noise) due to quantization during encoding of audio data. Herein, it is contemplated that a quantization noise filter may be applied by an encoder (i.e., during encoding of the audio data), or in a decoder (or a post-filtering system coupled and configured to filter the output of a decoder) during or after decoding of encoded audio data.
An example of a quantization noise filter implemented in an encoder (rather than in a decoder) is described in US Patent Application Publication No. 2010/0094637 A1, published Apr. 15, 2010, and assigned to the assignee of the present invention. The named inventor of US Patent Application Publication No. 2010/0094637 A1 is the same individual as the inventor of the present invention.
It is also contemplated herein that a quantization noise filter may be applied partially by an encoder and partially by a decoder (or a post-filtering system coupled and configured to filter the output of a decoder), for example, by applying a first filter stage in the encoder and a second filter stage in the decoder (or post-filtering system) after delivery of the encoded signal to the decoder. Examples of this latter type of quantization noise filter are those applied by the conventional TNS and Gain Control methods mentioned above. This type of conventional quantization noise filtering has limitations and disadvantages, such as the need for the decoder to apply the inverse of the filter stage (“encoder filter”) applied by the encoder, which prevents use of a decoder that is not specially configured to apply the inverse of the encoder filter.
The present inventor has recognized that it would be desirable to implement a quantization noise filter in a decoder (or a post-filter coupled to a decoder), so that a decoder (or post-filter) configured to apply the quantization noise filter can perform quantization noise filtering on audio content, and so that a conventional decoder (or a conventional decoder and conventional post-filter coupled thereto) not configured to apply the quantization noise filter can decode (and optionally also perform post-filtering on) audio content without performing quantization noise filtering on the audio content. In the latter case, the conventionally decoded audio content could usefully be rendered (i.e., the resulting sound could have acceptable quality, although the sound quality might suffer from audible noise due to quantization).