With the introduction of compact disks, digital video disks, portable digital media players, digital wireless networks, and audio and video delivery over the Internet, digital audio and video has become commonplace. Engineers use a variety of techniques to process digital audio and video efficiently while still maintaining the quality of the digital audio or video.
Digital audio information is processed as a series of numbers representing the audio information. For example, a single number can represent an audio sample, which is an amplitude value (i.e., loudness) at a particular time. Several factors affect the quality of the audio information, including sample depth, sampling rate, and channel mode.
Sample depth (or precision) indicates the range of numbers used to represent a sample. The more values possible for the sample, the higher the quality because the number can capture more subtle variations in amplitude. For example, an 8-bit sample has 256 possible values, while a 16-bit sample has 65,536 possible values. A 24-bit sample can capture normal loudness variations very finely, and can also capture unusually high loudness.
The sampling rate (usually measured as the number of samples per second) also affects quality. The higher the sampling rate, the higher the quality because more bandwidth can be represented. Some common sampling rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000, and 96,000 samples/second.
Mono and stereo are two common channel modes for audio. In mono mode, audio information is present in one channel. In stereo mode, audio information is present in two channels usually labeled the left and right channels. Other modes with more channels such as 5.1 channel, 7.1 channel, or 9.1 channel surround sound are also commonly used. The cost of high quality audio information is high bitrate. High quality audio information consumes large amounts of computer storage and transmission capacity.
Many computers and computer networks lack the storage or resources to process raw digital audio and video. Encoding (also called coding or bitrate compression) decreases the cost of storing and transmitting audio or video information by converting the information into a lower bitrate. Encoding can be lossless (in which quality does not suffer) or lossy (in which analytic quality suffers—though perceived audio quality may not—but the bitrate reduction compared to lossless encoding is more dramatic). Decoding (also called decompression) extracts a reconstructed version of the original information from the encoded form.
In response to the demand for efficient encoding and decoding of digital media data, many audio and video encoder/decoder systems (“codecs”) have been developed. For example, referring to FIG. 1, an audio encoder 100 takes input audio data 110 and encodes it to produce encoded audio output data 120 using one or more encoding modules. In FIG. 1, analysis module 130, frequency transformer module 140, quality reducer (lossy encoding) module 150 and lossless encoder module 160 are used to produce the encoded audio data 120. Controller 170 coordinates and controls the encoding process.
Existing audio codecs include Microsoft Corporation's Windows Media Audio (“WMA”) codec. Some other codec systems are provided or specified by the Motion Picture Experts Group (“MPEG”), Audio Layer 3 (“MP3”) standard, the MPEG-2 Advanced Audio Coding [“AAC”] standard, or by other commercial providers such as Dolby (which has provided the AC-2 and AC-3 standards).
Different encoding systems use specialized elementary bitstreams for inclusion in multiplex streams capable of carrying more than one elementary bitstream. Such multiplex streams are also known as transport streams. Transport streams typically place certain restrictions on elementary streams, such as buffer size limitations, and require certain information to be included in the elementary streams to facilitate decoding. Elementary streams typically include an access unit to facilitate synchronization and accurate decoding of the elementary stream, and provide identification for different elementary streams within the transport stream.
For example, Revision A of the AC-3 standard describes an elementary stream composed of a sequence of synchronization frames. Each synchronization frame contains a synchronization information header, a bitstream information header, six coded audio data blocks, and an error check field. The synchronization information header contains information for acquiring and maintaining synchronization in the bitstream. The synchronization information includes a synchronization word, a cyclic redundancy check word, sample rate information and frame size information. The bitstream information header follows the synchronization information header. The bitstream information includes coding mode information (e.g., number and type of channels), time code information, and other parameters.
The AAC standard describes Audio Data Transport Stream (ADTS) frames that consist of a fixed header, a variable header, an optional error check block, and raw data blocks. The fixed header contains information that does not change from frame to frame (e.g., a synchronization word, sampling rate information, channel configuration information, etc.), but is still repeated for each frame to allow random access into the bitstream. The variable header contains data that changes from frame to frame (e.g., frame length information, buffer fullness information, number of raw data blocks, etc.) The error check block includes the variable crc_check for cyclic redundancy checking.
Existing transport streams include the MPEG-2 system or transport stream. The MPEG-2 transport stream can include multiple elementary streams, such as one or more AC-3 streams. Within the MPEG-2 transport stream, an AC-3 elementary stream is identified by at least a stream_type variable, a stream_id variable, and an audio descriptor. The audio descriptor includes information for individual AC-3 streams, such as bitrate, number of channels, sample rate, and a descriptive text field.
For additional more information about the codec systems, see the respective standards or technical publications.