Alpha images are two-dimensional maps containing transparency information related to an image. In general the maps allow an image to be made either completely or partially transparent, by describing which portions of the image are transparent and to what degree transparency is realized at each position in the image. FIG. 1a illustrates an example alpha image 100 for an associated image of a bat (not shown). The illustrated white area in the alpha image represent areas that are opaque when displaying the bat image; the illustrated black area represents those parts of the bat image that are completely transparent.
In addition to describing areas of complete opacity or complete transparency, an alpha image oftentimes contains intermediate levels of transparency. This is illustrated, for example in FIG. 1b, which shows a close-up 110 of the alpha image. In the close-up, intermediate levels of transparency, indicated by shades of gray in between black and white, can be seen. Oftentimes these intermediate levels of transparency are found around an object boundary, as they are in FIG. 1b. 
In a typical alpha image implementation, these varying levels of transparency would be represented by values in between a maximum and minimum pixel value for each pixel in the alpha image. Thus, for example in FIG. 1b, if the image represents transparency using an 8 bit value, the pixels which represent transparent areas (shown here as black pixels) would contain the minimum value of 0. The “white” pixels, which represent complete opacity, would contain, in this example, the full value of 255 (28−1). Each of the boundary values, then would contain a value in between 0 and 255, depending on the opacity or transparency desired.
In addition to still images, alpha images can describe transparency for video as well. Typically, for video a sequence of alpha images, for example one alpha image per frame, is used. As with many pieces of information that are contained in video sequences, however, it is desirable to reduce the number of bits that are required to encode alpha images in a video sequence or bitstream, in order to reduce overall video bit rate.
Examples of Video Compression
Engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system.
Many existing encoders use at least one type of frequency transform during compression, such a discrete cosine transformation (“DCT”). For example, the encoder splits the key picture into non-overlapping blocks of samples and applies a forward frequency transform to individual blocks. The frequency transform maps the sample values of a block to transform coefficients, which are coefficients of basis functions that correspond to frequency components. In particular, the lowest frequency coefficient—called the DC coefficient—indicates the average sample value for the block. The other coefficients—called AC coefficients—indicate patterns of changes in sample values in the block, from gradual low-frequency variations across the block to sharper high-frequency variations within the block. In many encoding scenarios, a relatively small number of frequency coefficients (e.g., the DC coefficient and lower frequency AC coefficients) capture much of the energy or signal content in the block. The encoder quantizes the transform coefficients, resulting in a block of quantized transform coefficients. The encoder further encodes the quantized transform coefficients, for example, using entropy coding, and outputs a bitstream of compressed video information.
In corresponding decoding, a decoder reads the bitstream of compressed video information and performs operations to reconstruct the pictures that were encoded. When the encoding uses lossy compression (e.g., in quantization), the reconstructed pictures approximate the source pictures that were encoded but are not exactly the same. For example, to reconstruct a version of the original 8×8 block of the key picture, the decoder reconstructs quantized transform coefficients using entropy decoding. The decoder inverse quantizes the quantized transform coefficients of the block and applies an inverse frequency transform (such as inverse discrete cosine transform or “IDCT”) to convert coefficients from a frequency domain to a pixel (or “spatial”) domain, producing the reconstructed version of the original 8×8 block. Typically, an encoder also reconstructs encoded pictures, for use in subsequent motion compensation.
According to one possible definition, quantization is a term used for an approximating non-reversible mapping function commonly used for lossy compression, in which there is a specified set of possible output values, and each member of the set of possible output values has an associated set of input values that result in the selection of that particular output value.
Quantization can affect the fidelity with which coefficients are encoded, which in turn can affect bit rate. Coarser quantization tends to decrease fidelity to the original transform coefficients (and produce more distortion) as the coefficients are more coarsely approximated. Bit rate also decreases, however, when decreased complexity can be exploited with lossless compression (e.g., entropy encoding). Conversely, finer quantization tends to preserve fidelity and quality (and produce less distortion) but results in higher bit rates.
Different encoders typically use different parameters for quantization. In many encoders, a step size of quantization is set for a macroblock, block, picture, or other unit of video. The extent of quantization is parameterized in terms of the quantization step size, which is adapted to regulate quality and/or bit rate. Coarser quantization uses larger quantization step sizes. Finer quantization uses smaller quantization step sizes.
Examples of Problems Encoding Alpha Images
Alpha images have what is sometimes thought of as a relatively simple structure, particularly when characterized, for example, by the entropy of the alpha image, which is low compared to most visual images. This does not mean, however, that they are easy to encode efficiently in a video bitstream. Oftentimes, despite the simple structure of alpha images, it is desirous to devote as small as a fraction of the bitstream to the alpha plane.
Furthermore, for the sake of visual quality, it is often desired that 1) white and black pixels in the alpha image are preserved accurately, and 2) that edge information is preserved as best as possible. This is difficult using traditional DCT (or other Fourier transform) encoding techniques, because these techniques will not necessarily perfectly preserve the black and white portions of the alpha image, and may not be tuned to maintain the “gray” edges in the alpha image.
Additionally, for the sake of computational complexity, it is often desired that any alpha codec be low in complexity because the bulk of CPU usage is desired to be directed toward decoding the associated video channel. Moreover, the process of compositing the alpha image with the associated video is in itself non-trivial and there is also the likelihood of a second channel of video (for example, for a background) which must be decoded in real time as well. Finally, it is desirous for an alpha image codec to support both lossless and lossy coding of an alpha channel. An alpha image encoding and decoding solution that addresses these issues is what is needed.