Engineers use compression (also called coding or encoding) to reduce the bit rate of digital media. Compression decreases the cost of storing and transmitting media by converting the media into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original media from the compressed form.
When it converts media to a lower bit rate form, a media encoder can decrease the quality of the compressed media to reduce bit rate. By selectively removing detail in the media, the encoder makes the media simpler and easier to compress, but the compressed media is less faithful to the original media. Aside from this basic quality/bit rate tradeoff, the bit rate of the media depends on the content (e.g., complexity) of the media and the format of the media.
Media information is organized according to different formats for different devices and applications. Many attributes of format relate to resolution. For video, for example, sample depth, spatial resolution (e.g., in terms of width and height of a picture) and temporal resolution (e.g., in terms of number of pictures per second) relate to resolution. For audio, sample depth and sampling rate (e.g., in terms of number of samples per second) relate to resolution. Typically, quality and bit rate vary directly for resolution, with higher resolution resulting in higher quality and higher bit rate.
Scalable media encoding and decoding facilitate delivery of media to devices with different capabilities. A typical scalable media encoder splits media into a base layer and one or more enhancement layers. By itself the base layer provides a version of the media for reconstruction at a lower quality, and the enhancement layer(s) add information that will increase quality. Some scalable encoders and decoders rely on temporal scalability of media (e.g., low frame rate to high frame rate). Other common scalable encoding/decoding schemes use scalability for either the spatial resolution or the overall encoding quality of the video (e.g., low distortion to high distortion). Scalable layers can represent different quality points for a single type of resolution (e.g., for three different spatial resolutions 320×240, 640×480 and 1280×960). Or, scalable layers can represent different quality points for different types of resolution (e.g., for a 320×240 low quality base layer, a 640×480 low quality enhancement layer, a 320×240 higher quality enhancement layer, a 640×480 higher quality enhancement layer, and so on).
The base layer and one or more enhancement layers can be organized in separate bit streams for the respective layers. Or, the content for the respective scalable layers can be interleaved unit-by-unit for the encoded media. For example, for a first frame of audio, base layer data for the first frame precedes first enhancement layer data for the first frame, which precedes second enhancement layer data for the first frame. Then, base layer data, first enhancement layer data, and second enhancement layer data. follow for a second frame of audio. For video, the unit can be a picture or group of pictures, with base layer data and enhancement layer data organized by unit.
When scalable layers represent different quality points for a single type of resolution, and the scalable layers are organized unit-by-unit in a bit stream, the one or more layer(s) for low quality can be considered “embedded” within the layer for the next higher quality. These layers in turn can be considered embedded within the layer for the next higher quality. Selecting a target quality/bit rate can be accomplished by selecting a set of nested layers of encoded data for each of the units. One approach to creating an embedded bit stream with layers for different quality levels uses bit plane coding. In bit plane coding, the frequency transform coefficients for blocks of a picture are separated into a first plane have the most significant bit for each transform coefficient, a second plane having the next most significant bit for each coefficient, and so on, through the plane having the least significant bit for each coefficient. The respective bit planes are encoded in different scalable layers for different levels of encoding quality.
Delivering media content over the Internet and other computer networks has become more popular. Media delivery over the Internet is typically characterized by variable bandwidth, without dedicated bandwidth between a media server that distributes media content and a media client that plays back the media content. If the bit rate of media content is too high, the media content may be dropped by the network, causing playback by the media client to stall. Alternatively, the media client can buffer a large portion of the media content before playback begins, but this results in a long delay before playback starts. On the other hand, if the bit rate of the media content is much lower than the network could deliver, the quality of the media content played back will be lower than it could be. By adjusting bit rate of media content so that bit rate more closely matches available network bandwidth, a media server can improve the media client's playback experience. While existing ways of adjusting quality and bit rate of media content provide adequate performance in many scenarios, they do not have the benefits and advantages of the techniques and tools described below.