Appendix A, which is part of the present disclosure, contains assembly code for a digital signal processor for implementing one embodiment of this invention as described more completely below.
A portion of the present disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
Active Coding-3, also known as xe2x80x9cAC-3xe2x80x9d or xe2x80x9cDolby Digital,xe2x80x9d is a digital audio standard described in xe2x80x9cDigital Audio Compression (AC-3)xe2x80x9d by the United States Advanced Television Systems Committee, which is hereby incorporated by reference in its entirety.
A prior art AC-3 bitstream is illustrated in FIG. 1A, which is reproduced from FIG. 5.1 of xe2x80x9cDigital Audio Compression (AC-3)xe2x80x9d referenced above. The AC-3 bitstream is made up of a sequence of synchronization frames. Each synchronization frame contains 6 coded audio blocks (xe2x80x9cABxe2x80x9d), each of which represents 256 new audio samples. A synchronization information (xe2x80x9cSIxe2x80x9d) header at the beginning of each frame contains information needed to acquire and maintain synchronization. A bitstream information (xe2x80x9cBSIxe2x80x9d) header follows the SI header, and contains parameters describing the coded audio service. An auxiliary data (xe2x80x9cAUXxe2x80x9d) field may follow the coded audio blocks. At the end of each frame is an error check field that includes a cyclical redundancy check (xe2x80x9cCRCxe2x80x9d) word for error detection. An optional CRC word is also located in the SI header.
AC-3 provides a dynamic range control system that allows program providers, such as movie studios, to control the dynamic range of their audio programs. Dynamic range refers to the range of the relative sound levels in an audio program. For example, dialogues are usually used as a reference where loud sounds are certain decibels above the dialogue sound level while soft sounds are certain decibels below the dialogue sound level.
Program providers can encode dynamic range gain words (xe2x80x9cdynrngxe2x80x9d) (e.g., 8 bits) in the audio blocks to alter the gain of the audio blocks. The dynrng values typically indicate decibel (xe2x80x9cdBxe2x80x9d) gain reduction during the loudest signal passages, and dB gain increases during the quite passages. AC-3 provides that AC-3 decoders shall implement the compression characteristics indicated by the dytrng values encoded in the audio blocks. AC-3 further provides that AC-3 decoders may optionally allow listener control over the use of the dynrng values so that the listener may select full dynamic range reproduction by ignoring the dynrng values or partial dynamic range reproduction by using some fraction of the dynrng values.
Program providers can also encode compression gain words (xe2x80x9ccomprxe2x80x9d) (e.g., 8 bits) in the BI header to alter the gain of the audio frames. The compr values provide larger dynamic range reductions (also known as xe2x80x9cheavy compressionxe2x80x9d) than the dynrng values. The compr values have twice the control range as the dynrng values (xc2x148 dB vs. xc2x124 dB) with half the resolution (0.5 dB v. 0.25 dB).
AC-3 decoders may provide both 2-channel and m-channel outputs (m greater than 0; e.g., m=6). In some applications, consumers may desire independent dynamic range control for the 2-channel and the m-channel outputs. To provide independent dynamic range control for the 2-channel and the m-channel outputs, AC-3 decoders can (1) execute the decoding algorithm with one set of dynamic range gain words (or compression gain words) for the 2-channel output, and (2) execute the decoding algorithm again with another set of dynamic range gain words (or compression gain words) for the m-channel output. This method is inefficient because actions that are computational and/or memory intensive are repeated. Thus, what is needed is a method that provides independent dynamic range control for 2-channel and m-channel outputs while minimizing the repetition of computational and/or memory intensive actions.
In accordance with one aspect of the invention, independent dynamic range control are provided for 2-channel and m-channel outputs without repeating computational and/or memory intensive actions including the inverse transform and the windowing of audio samples.
In one embodiment, m-channel dynamic range control is conventionally applied to the m-channel audio samples in the frequency domain to form m-channel frequency samples. The m-channel frequency samples are inverse transformed to generate audio samples in the time domain (xe2x80x9cm-channel time samplesxe2x80x9d) and windowed to generate windowed time samples (i.e., the m-channel output).
2-channel dynamic range control is applied to the m-channel audio samples in the time domain after windowing instead of the m-channel audio samples in the frequency domain prior to the inverse transform, thereby avoiding repeating the inverse transform and the windowing of the m-channel audio samples. To do this, the m-channel output is divided into groups and each group is multiplied with a corresponding 2-channel dynamic range scale factor. The 2-channel dynamic range scale factors at least partially remove the effects of the m-channel dynamic range control applied in the frequency domain and the windowing in the time domain, and readjust the dynamic range of the m-channel output for 2-channel output. These audio samples are then downmixed to form the 2-channel output.
In another embodiment, the 2-channel and m-channel outputs are generated without repeating the inverse transform. The m-channel dynamic range control is conventionally applied to the m-channel frequency samples. The m-channel frequency samples are inverse transformed to generate the m-channel time samples.
The m-channel time samples are duplicated to two sets. In the first set, the m-channel time samples are conventionally windowed to generate windowed time samples (i.e., the m-channel output). In the second set, a 2-channel dynamic range final scale is multiplied with the m-channel time samples. The 2-channel final scale at least partially removes the effects of the m-channel dynamic range control applied in the frequency domain and readjusts dynamic range of the m-channel time samples for 2-channel output. The second set is then windowed to generate windowed time samples and downmixed to form the 2-channel output.