Throughout this disclosure (including in the claims) the term “comprises” denotes “is” or “includes,” and the expression “in a manner equivalent to” denotes either “by” or “in a manner not identical to but equivalent to.”
Throughout this disclosure (including in the claims) the term “transcoding” denotes decoding encoded data (that have been previously encoded in a first encoding format) and re-encoding the decoded data in a second encoding format. Typically, the decoding step of a transcoding operation includes the step of performing decompression on compressed data (that have previously been encoded in a first compression format), and the re-encoding step of a transcoding operation includes the step of performing a data compression operation to generate transcoded data in a second compression format.
In recent years consumer electronic devices employing audio compression have achieved tremendous commercial success. The most popular category of these devices includes the so-called MP3 players and portable media players. Such a player can store a number of user-selected songs in compressed format on a storage medium present in the player, and also includes electronic circuitry that decodes and decompresses the compressed songs in real time. With proliferation of various audio compression formats (e.g., MPEG1-Layers I, II, III, MPEG2-AAC, WMA, and AC3), the need for transcoding of audio between different compression formats is becoming commonplace.
Audio data transcoding is required when audio data received or stored in one format (e.g., one compressed format) needs to be encoded into another format (e.g., a different compressed format). Audio data transcoding from a first format to a second format is always undesirable unless the second format is lossless. This is because a second lossy encoding of audio data introduces additional distortion. In practice the need for transcoding usually arises when various parts of an audio processing chain require different audio codecs. The producer of compressed audio content may choose to encode the content in one preferred format, and yet it may be desired to play back the encoded content using a device whose only (or final stage) processing circuitry is designed for use with content encoded in a different format. The reasons for using different audio codecs during different parts of the audio chain include differences in industry standards, desired bit rate, quality, decoding complexity, channel characteristics.
In order for a consumer electronic device to be interoperable across industry standards, it is often necessary for the device to perform transcoding on audio data. For example, such devices may include components (or subsystems) that receive and decode only audio data having one of a small number of mandatory compressed formats (e.g., only audio data having one such format), and thus need to include at least one additional transcoding component or subsystem in order to support at least one audio format other than the mandatory formats.
Since the introduction of the first portable audio players in the market in 1997, MPEG1-Layer III (or “MP3”) audio format has become the de-facto standard for portable media players. The format has been so successful that the term MP3 has is sometimes used as a synonym for compressed audio and the expression MP3 player is sometimes used to denote any portable audio player. In typical MP3 player usage the listener keeps the MP3 player in a pocket or attaches it to a belt. Earbud phones or headphones worn by the listener are often connected to the MP3 player by a jack and wires. With the introduction of the wireless Bluetooth protocol and standardization of audio transport on Bluetooth links, use of wireless headphones is becoming popular. In a typical wireless headphone usage scenario, a MP3 player is equipped with a Bluetooth transmitter and a wireless headphone is equipped with a Bluetooth receiver.
The Bluetooth (A2DP) specification supports various audio compression formats, including linear PCM, Sub Band Coding (“SBC”), MPEG1-LIII and others. SBC is specified to be a mandatory codec and is guaranteed to be supported by all Bluetooth compliant wireless headphones. Implementing a portable audio player to transmit audio in MP3 or other non-SBC formats from a portable audio player over a wireless link is undesirable where there is no assurance that readily available wireless headphones will be able to decode the audio transmitted over the wireless link. On the other hand, even when a portable audio player is implemented to transmit audio data in SBC format over a Bluetooth link, it will typically be undesirable to store the audio content in SBC format in the player for at least two reasons: first, storing the content in the player in SBC format rather than MP3 format would require more memory space for the same quality because SBC codecs are less efficient than MP3 codecs; and second, all legacy content will likely need to be encoded in SBC format. Therefore in wireless headphone applications, there is a definite need for transcoding of MP3 format audio data (e.g., audio data in MP3 format stored in a portable audio player) to SBC format audio data (for transmission over a wireless Bluetooth link).
Audio compression in accordance with most formats in use today (including the MP3 and SBC formats) employs perceptual transform coding. In perceptual transform coding, time-domain samples of input audio are first converted into frequency-domain coefficients using an analysis filterbank. The frequency-domain coefficients at the output of analysis filterbank are then quantized using perceptual criteria in order to achieve the highest audio quality at the desired bit rate. At the decoder, the frequency-domain coefficients are reconstructed through the process of inverse quantization of the quantized coefficients. The reconstructed frequency-domain coefficients are then transformed back to time-domain audio samples using a synthesis filterbank.
A conventional, straight-forward approach to transcoding input audio data in a first encoding format (where the input audio data comprise frequency-domain coefficients that have undergone quantization using perceptual criteria) is to:
(a) decode the input audio data by:                (i) demultiplexing and decoding the incoming encoded bit-stream (which is encoded in the first encoding format) and producing quantized frequency domain coefficients,        (ii) generating reconstructed frequency-domain coefficients using inverse quantization, and then        (iii) transforming the reconstructed frequency-domain coefficients to time-domain audio samples using a synthesis filterbank; and        
(b) after step (a), re-encode the time-domain audio samples in accordance with a second encoding algorithm to generate transcoded audio data comprising frequency-domain coefficients having a second encoding format. Typically, step (b) includes the steps of generating additional frequency-domain coefficients by transforming the time-domain audio samples generated in step (iii) using an analysis filterbank, and performing quantization on the additional frequency-domain coefficients using perceptual criteria, and then multiplexing the quantized coefficient indices into a bit-stream in second encoded audio format.
The steps of bitstream demultiplexing (step (a)(i)) and multiplexing (the last operation in step (b)) as described above will be omitted in the following discussion because their details are not relevant to the invention, but they are typically performed by both conventional transcoding systems and transcoding systems that embody the present invention.
FIG. 1 is a block diagram of a system performing this conventional transcoding operation, using a first perceptual transform audio codec to perform step (a) and a second perceptual transform audio codec to perform step (b). The system of FIG. 1 performs MP3 encoding of audio data (using analysis filterbank 2 and quantization circuits Q), transcodes the resulting MP3 format audio data (using inverse quantization circuits IQ, synthesis filterbank 4, analysis filterbank 6, and quantization circuits Q′, connected as shown) to generate transcoded audio data having SBC format, and performs SBC decoding on the transcoded audio data (using inverse quantization circuits IQ′ and synthesis filterbank 8) to generate time-domain samples of decoded audio data.
MPEG1-Layers I, II, and III all use a pseudo perfect-reconstruction quadrature mirror filterbank (QMF) for time-domain to frequency-domain transformation during encoding. Such an analysis filterbank decomposes the time-domain signal to be encoded into 32 streams of frequency coefficients (also referred to as 32 “frequency band signals” or 32 streams of “frequency-band coefficients”), each corresponding to a different one of 32 different frequency bands. The MPEG1-Layer III (“MP3”) encoding method further decomposes each of such 32 frequency sub-band signals into 18 streams of frequency-domain coefficients (which are also “frequency band signals,” each corresponding to a different one of 18 different frequency sub-bands of one of the 32 frequency bands, and are sometimes referred to herein as “frequency sub-band signals” or streams of “frequency sub-band coefficients”) using a modified discrete cosine transform. Thus a 576-band analysis filterbank can be used to convert time-domain samples of input audio into 576 streams of frequency sub-band coefficients (which are then quantized) to implement MP3 encoding.
The SBC algorithm also uses a pseudo perfect-reconstruction QMF for time-domain to frequency-domain transformation during SBC encoding. Such an analysis filterbank decomposes the time-domain signal to be encoded into 4 or 8 frequency bands. Thus, a four-band (or eight-band) analysis filterbank can be used to convert time-domain samples of input audio into 4 (or 8) streams of frequency-domain coefficients (which then undergo quantization) to implement SBC encoding.
In FIG. 1, blocks Q and Q′ indicate circuitry configured to perform quantization (during encoding) and blocks IQ and IQ′ indicate circuitry configured to perform inverse quantization (during decoding).
The system of FIG. 1 includes 576-band MP3 analysis filterbank 2 which outputs 576 streams of frequency sub-band coefficients (frequency-domain data) in response to a stream of time-domain audio data samples to be encoded. Each of these coefficients is quantized in circuit blocks Q to generate MP3-encoded audio data (quantized frequency-domain coefficients). Each of the coefficients can be quantized in one of circuit blocks Q or more than one of the coefficients can be quantized in each of at least some of blocks Q (the circuit blocks Q can but need not all receive the same number of streams of frequency band coefficients).
The MP3-encoded audio data are transcoded in circuit blocks IQ, synthesis filterbank 4, analysis filterbank 6 and circuit blocks Q′. Filterbank 4 is cascaded with filterbank 6. Circuit blocks IQ perform inverse quantization on each of the 576 streams of quantized frequency sub-band coefficients generated in response to input data samples, and the resulting inverse-quantized coefficients are processed in 576-band MP3 synthesis filterbank 4 to recover the audio data (a sequence of time-domain samples) that was originally input to filterbank 2.
The time-domain samples of recovered audio data then undergo SBC encoding in analysis filterbank 6 (which is an eight-band SBC analysis filterbank) and quantization circuits Q′. Filterbank 6 outputs eight streams of frequency sub-band coefficients (frequency-domain data) in response to a stream of time-domain audio data samples received from filterbank 4, and these coefficients are quantized in circuit blocks Q′ to generate SBC-encoded audio data (SBC-encoded, quantized frequency-domain coefficients). Each of the coefficients output from filterbank 6 can be quantized in one of circuit blocks Q′ or more than one of the coefficients can be quantized in each of at least some of blocks Q′ (the circuit blocks Q′ can but need not all receive the same number of streams of frequency sub-band coefficients).
The SBC-encoded audio data are decoded in circuit blocks IQ′ and SBC synthesis filterbank 8 (which is a four-band or eight-band SBC synthesis filterbank). More specifically, the quantized frequency sub-band coefficients output from blocks Q′ undergo inverse quantization in circuit blocks IQ′ and the resulting inverse-quantized coefficients are processed in synthesis filterbank 8 to recover the audio data (a sequence of time-domain samples) that was originally input to filterbank 6.
During conventional encoding (e.g., MP3 or SBC encoding) of audio data of the types discussed above, it is known to implement an analysis filterbank as a first stage configured to perform anti-aliasing (or low-pass) filtering followed by a second stage configured to perform discrete cosine transform (e.g., an MDCT, during MP3 encoding). A cascade of such a first stage and such a second stage is equivalent to (and can implement) a filter stage (that implements any of a broad class of filtering operations) followed by a decimation (down-sampling) stage.
During conventional decoding (e.g., MP3 or SBC decoding) of audio data of the types discussed above, it is known to implement a synthesis filterbank as a first stage configured to perform an inverse discrete cosine transform (IDCT) followed by a multi-input multi-output low-pass filtering operation. A cascade of such a first stage and such a second stage is equivalent to (and is derived from) an up-sampling stage followed by a filter stage (that implements a bank of parallel band-pass filters that are cosine-modulated versions of a low-pass prototype filter). The first approach that uses IDCT is commonly used in practical implementations because of its efficiency.
The inventors have appreciated that it is inefficient to implement transcoding by using a synthesis filterbank (implemented as an up-sampling stage followed by a filter stage, or as an IDCT followed by anti-aliasing filter stage) followed by an analysis filterbank (implemented as a filter stage followed by a down-sampling stage, or as a anti-aliasing filter stage followed by DCT stage). There are several reasons for this including that use of such implementations of filterbanks require undesirably complex computations and require an undesirably large amount of memory for storing coefficients for implementing the filtering operations.
To appreciate the following description of embodiments of the present invention, it is helpful to consider characteristics of frequency-band coefficients (e.g., frequency sub-band coefficients, such as those generated during MP3 encoding of audio data that are asserted from analysis filterbank 2 of the conventional FIG. 1 system to quantization circuits Q) that are generated in a manner equivalent to time-to-frequency-domain transformation of time-domain audio data. Frequency-band coefficients of this type can also be viewed as time-domain samples that filtered using a narrow-band filter and downsampled and can be described in the same terms as if they were time-domain audio data. For example, a stream of frequency-band coefficients can usefully be described as being up-sampled or down-sampled (as if it were a stream of samples of time-domain audio data).
Also in the following description of embodiments of the invention, the expressions that frequency coefficients (e.g. frequency-band coefficients) “are indicative of” or “determine” at least one time-domain sample of audio data (in the context of processing the coefficients to decode or transcode the audio data) denote that performing predetermined decoding operations on the coefficients (e.g., processing them in a synthesis filterbank having predetermined characteristics) can recover the at least one time-domain sample of audio data therefrom.