1. Field of the Invention
This invention relates generally to digital video technology and more particularly to a method and apparatus for implementing efficient memory compression methods.
2. Description of the Related Art
The access of video on mobile terminals, such as cellular phones and personal digital assistants, presents many challenges because of the limitations due to the nature of the mobile systems. For example, low-powered, handheld devices are constrained under bandwidth, power, memory, and cost requirements. The video data received by these handheld devices are decoded through a video decoder. The video decoders associated with such terminals perform motion compensation in the spatial domain, i.e., decompressed domain. Video compression standards, such as H.263, H261 and MPEG-1/2/4, use a motion-compensated discrete cosine transform (DCT) scheme to encode videos at low bit rates. As used herein, low bit rates refer to bit rates less than about 64 kilobits per second. The DCT scheme uses motion estimation (ME) and motion compensation (MC) to remove temporal redundancy and DCT to remove the remaining spatial redundancy.
FIG. 1 is a schematic diagram of a video decoder for decoding video data and performing motion compensation in the spatial domain. Bit stream 102 is received by decoder 100. Decoder 100 includes variable length decoder (VLD) stage 104, run length decoder (RLD) stage 106, dequantization (DQ) stage 108, inverse discrete cosine transform (IDCT) stage 110, motion compensation (MC) stage 112 and memory (MEM) 114, also referred to as a frame buffer. The first four stages (VLD 104, RLD 106, DQ 108, and IDCT 110) decode the compressed bit stream back into the pixel domain. For an intracoded block, the output of the first four stages, 104, 106, 108 and 110, is used directly to reconstruct the block in the current frame. For an interceded block, the output represents the prediction error and is added to the prediction formed from the previous frame to reconstruct the block in the current frame. Accordingly, the current frame is reconstructed on a block by block basis. Finally, the current frame is sent to the output of the decoder, i.e., display 116, and is also stored in frame buffer (MEM) 114.
MEM 114 stores the previously decoded picture required by motion compensation 112. The size of MEM 114 must scale with the incoming picture format. For example, H.263 supports five standardized picture formats: (1) sub-quarter common intermediate format, (sub QCIF), (2) quarter common intermediate format (QCIF), (3) common intermediate format (CIF), (4) 4CIF, and (5) 16CIF. Each format defines the width and height of the picture as well as its aspect ratio. As is generally known, pictures are coded as a single luminance component and two color difference components (Y, Cr, Cb). The components are sampled in a 4:2:0 configuration, and each component has a resolution of 8 bits/pixel. For example, the video decoder of FIG. 1 must allocate approximately 200 kilobytes of memory for MEM 114 while decoding a H.263 bit stream with CIF format. Furthermore, when multiple bit streams are being decoded at once, as required by video conferencing systems, the demands for memory become excessive.
MEM 114 is the single greatest source of memory usage in video decoder 100. In order to reduce memory usage, one approach might be to reduce the resolution of the color components for the incoming bit stream. For example, if the color display depth on the mobile terminal can only show 65,536 colors, then it is possible to reduce the resolution of the color components (Y, Cr, Cb) from 24 bits/pixel down to 16 bits/pixel. While this technique can potentially reduce memory usage by 30%, it is a display dependent solution that must be hardwired in the video decoder. Also, this technique does not scale easily with changing peak signal-to-noise ratio (PSNR) requirements, therefore, this approach is not flexible.
Operating on the data in the spatial domain requires increased memory capacity as compared to compressed domain processing. In the spatial domain, the motion compensation is readily calculated and applied to successive frames of an image. However, when operating in the compressed domain motion compensation is not as straightforward as a motion vector pointing back to a previous frame since the error values are no longer spatial values, i.e., the error values are not pixel values when operating in the compressed domain. Additionally, methods capable of efficiently handling compressed domain data are not available. Prior art approaches have focused mainly on transcoding, scaling and sharpening compressed domain applications. Additionally, inverse compensation applications for the compressed domain tend to give poor peak signal to noise ratio (PSNR) performance and at the same time have an unacceptably slow response time in terms of the amount of frames per second that can be displayed.
As a result, there is a need to solve the problems of the prior art to provide a method and apparatus that minimizes the demands on memory for decoding low bit rate video data.