This patent application is related to U.S. patent application Ser. No. 09/595,387 entitled xe2x80x9cA FAST CODE LENGTH SEARCH METHOD FOR MPEG AUDIO ENCODINGxe2x80x9d filed Jun. 14, 2000; and is related to U.S. patent application Ser. No. 09/595,391, entitled xe2x80x9cA FAST LOOP ITERATION AND BITSTREAM FORMATTING METHOD FOR MPEG AUDIO ENCODINGxe2x80x9d filed Jun. 14, 2000; the disclosures of which are herein incorporated by reference.
1. Field of the Invention
The present invention relates generally to the field of audio encoding, and more particularly to a fast codebook search method for finding an optimal Huffman codebook from among a group of Huffman codebooks, wherein the method is especially suited for MPEG-compliant audio encoding.
2. Description of the Related Art
In general, an audio encoder processes a digital audio signal and produces a compressed bit stream suitable for storage. A standard method for audio encoding and decoding is specified by xe2x80x9cCODING OF MOVING PICTURES AND ASSOCIATED AUDIO OR DIGITAL STORAGE MEDIA AT UP TO ABOUT 1.5 MBIT/s, Part 3 Audioxe2x80x9d (3-11171 rev 1), submitted for approval to ISO-IEC/JTC1 SC29, and prepared by SC29/WG11, also known as MPEG (Moving Pictures Expert Group). This draft version was adopted with some modifications as ISO/IEC 11172-3:1993(E) (hereinafter xe2x80x9cMPEG-1 Audio Encodingxe2x80x9d). The disclosure of these MPEG-1 Audio Encoding standard specifications are herein incorporated by reference. This standard is also often referred to as xe2x80x9cMP3xe2x80x9d or xe2x80x9cMP3 audio encoding.xe2x80x9d The exact encoder algorithm is not standardized, and a compliant system may use various means for encoding such as estimation of the auditory masking threshold, quantization, and scaling. However, the encoder output must be such that a decoder conforming to the MPEG-1 standard will produce audio suitable for an intended application.
As shown in FIG. 1, input audio samples are fed into the encoder 2. The mapping stage 4 creates a filtered and sub-sampled representation of the input audio stream. The mapped samples may be called either sub-band samples (as in Layer I, see below) or transformed sub-band samples (as in Layer III). A psychoacoustic model 10 creates a set of data to control the quantizer and coding block 6. The data supplied by the psychoacoustic model 10 may vary depending on the actual coder implementation 6. One possibility is to use an estimation of a masking threshold to do this quantizer control. The quantizer and coding block 6 creates a set of coding symbols from the mapped input samples. Again, the actual implementation of the quantizer and coder block 6 can depend on the encoding system. The frame packing block 8 assembles the actual bit stream from the output data of the other blocks, and adds other information (e.g. error correction) if necessary.
In general, as shown in FIG. 3, each quantized data frame 30 contains 576 data samples. Each frame 30 is divided into three sub-regions 32, 34, 36, with each region containing an even number of data samples, and with at least on region further divided into sub-regions. Adjacent data samples 38, or xe2x80x9cdata pairsxe2x80x9d are used as X, Y coordinates into a Huffman codebook, which provides a single code value for each data pair, as illustrated in FIG. 4. A codebook is a table containing bit codes for encoding the data pairs and a code length value. For certain regions, the data may be encoded in groups of four data samples (quadruples) instead of pairs. The MPEG-1 standard uses 32 different codebooks, of which two or three are candidates for each sub-region, depending on the maximum data value in each sub-region. The xe2x80x9coptimalxe2x80x9d codebook for each sub-region is the single codebook from among the candidate codebooks that uses the fewest number of total bits to code the entire sub-region.
Depending on the application, different layers of the coding system having increasing encoder complexity and performance can be used. An ISO MPEG Audio Layer N decoder is able to decode bit stream data that has been encoded in Layer N and all layers below N, as described below:
Layer I
This layer contains the basic mapping of the digital audio input into 32 sub-bands, fixed segmentation to format the data into blocks, a psychoacoustic model to determine the adaptive bit allocation, and quantization using block companding and formatting.
Layer II
This layer provides additional coding of bit allocation, scale factors and samples, and a different framing is used.
Layer III
This layer introduces increased frequency resolution based on a hybrid filter bank. It adds a different (non-uniform) quantizer, adaptive segmentation and entropy coding of the quantized values.
Joint stereo coding can be added as an additional feature to any of the layers.
A decoder 12 accepts the compressed audio bit stream, decodes the data elements, and uses the information to produce digital audio output, as shown in FIG. 2. The bit stream data is fed into the decoder 12. Then, the bit stream unpacking and decoding block 14 performs error detection, if error-checking has been applied by the encoder 2. The bit stream data is unpacked to recover the various pieces of information. The reconstruction block 16 reconstructs the quantized version of the set of mapped samples. The inverse mapping block 18 transforms these mapped samples back into uniform PCM (pulse code modulation).
As originally envisioned by the drafters of the MPEG audio encoder specification, the encoder would be implemented in hardware. Hardware implementations provide dedicated processing, but generally have limited available memory. For software MPEG encoding and decoding implementations such as software programs running on Intel Pentium(trademark) class microprocessors, however, the need for greater processing efficiency has arisen, while the memory restrictions are less critical. Specifically, in prior art solutions, the processing time associated with selecting an optimal codebook from among a group of candidate codebooks is much too long.
The present invention is a fast codebook search method for finding an optimal Huffman codebook from a group of Huffman codebooks, wherein the method is especially suited for MPEG-compliant audio encoding. In order to select an optimal codebook from among candidate codebooks for a given sub-region, a bit difference table is created, which for any given data pair (or quadruple) contains a bit difference value. The bit difference value is the difference between the number of bits needed for a given data pair (or quadruple) in a first candidate codebook and a second candidate codebook [N bitsxe2x88x92M bits]. By summing all such bit difference values for the data samples in a given sub-region, a quick determination can be made as to which codebook would encode the sub-region using the fewest bits (based on the size and/or sign of the sum(s)). For sub-regions having three candidate codebooks, two bit difference sums are calculated. For an implementation of the MPEG-1 Layer III Audio Encoding standard, only 20 bit difference tables are required in order to cover every possible combination of codebook candidates.
Thus, the present invention determines the optimal codebook for each sub-region by merely summing the bit difference values from the appropriate bit difference table. This allows for a quicker determination, with much fewer calculations than required by the prior art approach. Since the procedure is performed within an xe2x80x9cinner loopxe2x80x9d iteration, the present invention reduces the required computation time by about 50% for two codebooks in the group, and approximately 33% if there are three codebooks.