Recent times have seen an acceleration in efforts by suppliers of consumer electronics to greatly expand the amount and quality of information provided to users. The expanded use of multimedia information in communications and entertainment systems along with user demands for higher quality and faster presentations of the information has driven communications and entertainment industries to seek systems for communicating and presenting information with higher densities of useful information. These demands have stimulated the development and expansion of digital techniques to code and format signals to carry the information.
Unlike most of the communication systems of the past, particularly television broadcast systems and other systems used for home entertainment, where analog signals have filled available bandwidths with single program real time signals in a straight forward format that includes much redundant and humanly imperceivable information, digital transmission systems possess the ability to combine and identify multiple programs and to selectively filter certain redundant or otherwise useless information to provide capabilities for the transmission of programs having higher quality and information carrying ability. As a result of the high technological demand for such capabilities, advances toward the specification and development of digital communications formats and systems have accelerated.
MPEG-1
In furtherance of these advances, the industry sponsored Motion Pictures Expert Group (MPEG) has specified a format for the encoding of multimedia programs referred to as MPEG-1, and, more formally, as ISO-11172. MPEG-1 defines a group of essentially three techniques, one for compressing digitized audio consisting of one (mono) or two (stereo) channels of sound (ISO/IEC 11172-3, section 3 of the MPEG-1 standard), another for compressing digital video (ISO/IEC 11172-2, section 2 of the MPEG-1 standard), and another for combining the compressed streams of audio and video into storage (e.g. CD-ROM) or transmission (e.g. digital satellite television) systems (ISO/IEC 11172-1, section 1 of the MPEG-1 standard), such that they can be treated as a single stream of data but still separated and decoded properly.
The overall MPEG-1 specification is targeted at digital storage media applications, typically with bit-rates up to 1.5 Mbits/second, such as could be obtained from a CD. The resulting picture and sound quality of MPEG-1 systems was anticipated to be below that of regular broadcast television or VHS playback. In certain configurations, the data can be played back on a personal computer using only software programs to decode the video and audio, although both sections of the standard allow for more complicated methods of compression, which require dedicated hardware to decode but deliver higher quality or more compression.
The audio standard, part three of MPEG-1, specifies the decoding process for one or two channel audio, which can carry monaural, stereo or two multi-lingual channels. For stereo digital recording, for example, MPEG-1 specified for a stream of data of a particular format containing a series of interleaved pairs of samples representing a left channel and a right channel. As a result, in the basic two channel MPEG-1 compatible data stream, where the transformed and compressed samples are encoded alternately for each channel and grouped into relatively large frames, typically 1152 samples per channel, only 32 samples of each of the two channels need be read, stored and processed in the decoder at a given time in order to produce decompressed audio samples for output to the required channels at the required presentation rate.
A variety of compression schemes are possible in both MPEG-1 and MPEG-2. Both MPEG-1 and MPEG-2 audio, for example, provide three compression techniques, referred to as "layers", of increasing compression quality and decoder complexity. Layer I and Layer II, the two simpler compression schemes, are typically used for consumer broadcast and storage applications, while Layer III is usually reserved for professional or special applications. The above described data features are for typical Layer II coding but most are generally common to each of these schemes. The 1152 samples per channel per audio frame referred to above is a specific feature of Layer II audio compression, which is three times the 384 samples per channel per frame for Layer I compression. Layer II is the compression method usually used in DTV and other consumer applications.
MPEG-2
MPEG-2 is designed to extend the techniques of MPEG-1 to give a quality at least as good as VHS, and potentially approaching that of a movie theater, as well as the ability to transmit or store more than one program in a single data stream. Specifically in the audio section, MPEG-2 provides for methods to encode more than two audio channels to give surround sound playback, which is typically configured as six channels, such as front left, front right, front center, rear left, rear right and a Low Frequency Effects channel, although other combinations of up to six channels are possible.
The coding of the Low Frequency Effects (LFE) channel, if present, in the surround combinations uses greater compression because of its limited audio bandwidth, which is 125 Hz rather than about 20 kHz. As a result, the LFE channel represents a much smaller proportion of the data stream than the other channels, and is often omitted from diagrams of the stream. Because of the limited bandwidth of the LFE channel, the surround channel combination that includes the LFE channel is commonly referred to as 5.1 channel, rather than 6 channel, audio.
In addition, the MPEG-2 audio standard (ISO/IEC 13818-3) provides "backward compatibility", so that if an MPEG-2 audio data stream is fed into an MPEG-1 audio decoder, a reasonable combination of the surround channels which were encoded into the stream will be decoded to the two outputs. This is possible because the MPEG-1 audio standard makes provision for "ancillary data" to be inserted into the compressed stream, which the decoder must be able to ignore or discard. The extra information for the additional channels in the MPEG-2 audio stream appears to an MPEG-1 decoder as this "ancillary data".
Bitstream Structure
The MPEG-1 bitstream may be viewed as bitstream 10, illustrated diagrammatically in FIG. 1, which is formatted to carry one or more frames 11 of audio data. A frame of audio data includes 1152 samples per channel, at a sampling rate of, for example, 48 kHz, for 24 msec of audio per frame. These MPEG-1 audio frames each include a header 12 of, for example, 32 bits of identifying and coding data, followed by an audio data stream 16 in which interleaved pairs of 1152 frequency domain compressed samples of data representing each of two possible channels 1 3 and 14, for example for left channel stereo and right channel stereo, are encoded, as illustrated in FIG. 1A. Sequential groups or "turns" of frequency domain data samples n, for each channel 13,14, are decodable into time domain digital representations of sound in two stereo channels. The data samples are encoded in frequency subband blocks and samples of, for example, 32 frequency subbands m, for each of a plurality of, for example, 12 groups, and with, for example, one to three samples each, depending on the compression layer selected by the program transmitter. Following the audio data stream, MPEG-1 provides for the inclusion of ancillary data in an ancillary data field 15, which MPEG-1 processors might or might not ignore. The specified MPEG-1 audio standard is set forth in detail in ISO/IEC 11172-3, "Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s--Part 3: Audio." ISO/IEC JEC 1/SC 29, expressly incorporated herein by reference.
A frame of MPEG-2 multichannel audio follows the format of MPEG-1 audio frames, with the additional data of the MPEG-2 data stream replacing or augmenting the ancillary data field 15 at the end of the MPEG-1 audio data frame 11. This additional information includes a leading header of identifying fields and fields of additional channel data, collectively referred to as the mcml.sub.-- extension(). The leading header fields include control or ID information that will inform an MPEG-2 decoder of the nature and format of the data that follows. The data streams that follow are multichannel audio data streams and/or multi-lingual audio data streams.
An MPEG-2 program bitstream 20 that includes an MPEG-2 audio data frame 21 is diagrammatically represented in FIG. 2, as including components corresponding the MPEG-1 audio frame 11. Its components include header 12, the stream of channel 1 & channel 2 data 16 and a data stream occupying the ancillary data field 15. The ancillary data stream field 15 includes a data stream 17 of audio data representing audio channels 3 through 6. The data 17 includes data 13 that can be reproduced by an MPEG-1 decoder to produce output for a left channel stereo and data 14 that can be reproduced by an MPEG-1 decoder to produce output for a right channel stereo. At the beginning of the ancillary data field 15 is included a multichannel identifying and coding information field 22.
The data stream 17 includes 1152 samples per channel per frame of, for example, the three additional channels 3 through 5 of audio data 23, 24 and 25, respectively. The audio data for the additional channels is also coded in corresponding samples i, 1152 samples per frame 21, with each sample coded in 32 frequency domain sub-blocks k, as illustrated in FIG. 2A.
Fields of other data 25 the audio data may follow the data 23-25 in the ancillary data field 15. The specified MPEG-2 audio standard is set forth in detail in ISO/IEC 13818-3, "Generic Coding of Moving Pictures and Associated Audio Information--Part 3: Audio." ISO/IEC JEC 1/SC 29, expressly incorporated herein by reference. An explanation of the MPEG audio standards, including the syntax and semantics of the MPEG signals, can be found in Haskell et al., Digital Video: An Introduction to MPEG-2, Chapman & Hall, NY, N.Y., 1997, particularly chapter 4 thereof.
The three streams of audio data 23-25 for the additional three channels may typically represent, for example, three additional channels of surround sound audio: a front-center channel, a surround-right channel and a surround-left channel. However, the first two channels of a five channel surround system usually do not ideally reproduce a stereo sound where the program is encoded in multiple channels for surround sound or some other multichannel reproduction. Therefore, to make five channel sound backward compatible with MPEG-1 two channel stereo (and for other reasons such as compression and coding efficiency), linear combinations of the five surround channels are often transmitted instead of the separate streams which each separately and fully encode each of the five input channels. The combinations are formed by multiplying the five input signals by a 5.times.5 or other appropriate transformation matrix. Therefore, when reproduced by an MPEG-1 decoder, without employing an inverse matrix transformation that would be employed by an MPEG-2 decoder, the first two channels so reproduced will result in a better rendition of two channel stereo when decoded with a decoder of an MPEG-1 system. However, following an inverse matrix transformation by an MPEG-2 decoder, the first two channels of a five channel program are reproduced in the left and right front channels of the five channel system, while regenerated sound of center and left surround and right surround channels are output by corresponding channels of the five channel surround system.
As suggested above, notwithstanding a consideration of backward compatibility with MPEG-1 systems, it is usually not desirable to encode five channels of surround audio totally separately, since coding efficiency can be increased by eliminating redundancies between channels by encoding common components in the first two channels and coding difference components only for the other three channels. As a result the transformed coding scheme, an MPEG-1 system reproduces virtually all of the audio program by decoding the first two channels, while the MPEG-2 system constructs the other three channels by copying parts of their streams from other channels, particularly from channels one and two, using any of a number of decoding algorithms, including those required to reverse the compression, predictive and other coding schemes used by the encoder of the transmitter. An MPEG-2 program, so encoded, contains information in data 21 that identifies the coding scheme employed, so that decoding of the program can be properly implemented by the receiver. This identifying data and the audio data for the additional channels will appear, in an MPEG-2 signal, in place of an ancillary data stream at the end of what is otherwise a valid MPEG-1 audio stream.
The matrixed coding discussed above imposes decoding requirements on an MPEG-2 receiver, since the data to the additional channels, and in most cases also the data to the two front stereo channels, must be completely available to the decoder before any output to the channels can be produced. That is to say, outputting of the first data to a channel must await receipt and processing of data from near the end of the bitstream of the input signal.
Backward Compatibility
For backward compatibility of MPEG-2 audio with MPEG-1 decoders, the program bitstream of an MPEG-2 signal, along with fields of appropriate identifying and coding information, is encoded in a format that is reproducible by an MPEG-1 decoder, with the additional channels that make up MPEG-2 audio being encoded into the MPEG-1 ancillary data field, where it can be ignored by an MPEG-1 decoder. However, the provision for backward compatibility increases the difficulty of the task of the MPEG-2 decoder to reproduce five or six channel sound. The difficulty arises in part from the fact that the compressed data for the third through sixth channels follow, and are received by an MPEG-2 receiver, after the receipt of the entire frame of compressed samples for the first two channels. As a result, in order to produce decoded samples for all channels, it is necessary to read and store all 1152 compressed samples for the first two channels before it is possible to even start to read the data for the other channels.
A straight forward method of decoding an MPEG-2 multiple channel audio program of more than two channels is to read all of the audio data of a given frame into memory, or at least the audio data streams for the first two channels, then decoding the data for sequential output to all of the channels. To accomplish this requires the use of local memory in the decoder chip, or Static Random Access Memory which memory is typically of the type referred to as SRAM. This active, random access volatile memory provides the speed necessary for such a matrix transforming operation, but it is very expensive to provide such SRAM in the large quantity needed to effectively handle the data needed to decode the additional channels.
Because an MPEG-1 or MPEG-2 audio decoder can take a variable amount of time and occasionally more than the length of time between the playback of successive samples to decode a single sample for each channel, it is common to run the decoder ahead of the playback scheme and to store the pre-decoded samples in memory so that they are available for playback as required. In a combination video and audio decoder, it is further necessary to consider synchronization between the displayed video and audio, sometimes referred to as "lip-sync". Typically, a much larger buffer of pre-decoded samples stored in memory is employed. In this way, samples can be discarded without being played in order to speed up the audio and bring it into synchronization with the video. Still, current approaches to the storing of a large number of pre-decoded audio samples are unacceptably costly.
Further, the approach of writing the received audio data to an external buffer of DRAM or other lower cost memory does not provide the computational performance or speed that is required where the stored data must be accessed out of the order in which it is stored and repetitive reads and writes are required.
In addition, in some applications, the audio decoder cannot be told by the controller at the system level whether the incoming data stream is MPEG-1 or MPEG-2. Therefore, the decoder must detect the MPEG standard of the incoming stream from the data stream itself. In such a case, before possessing the information needed to determine the MPEG standard being used, the decoder will already have read and at least partially decoded and stored the information relating to channels 1 and 2, and will be in the process of reading the ancillary data field that would contain information for channels 3-5, particularly the headers thereof. If the data stream is then discovered to be MPEG-1, which is first learned by the decoder when it finds that the ancillary data field does not contain an MPEG-2 header or any other information on channels 3-5, the stored data could be in the wrong layout for decoding as MPEG-1 stereo.
For the reasons set forth above, there is a need for a multichannel audio decoding method and apparatus that can rapidly and effectively decode audio data for individual audio channels from the data streams of more than one audio channel, and to do so with low and inexpensive memory requirements.