1. Field of the Invention
The invention relates to the field of data compression and decompression. More specifically, the invention relates to compression and decompression of still image and/or motion video data.
2. Background Information
A frame of still or motion video typically comprises a number of frame elements referred to as pixels (e.g., a 640xc3x97480 frame comprises over 300,000 pixels). Each pixel is represented by a binary pattern that describes that pixel""s characteristics (e.g., color, brightness, etc.). Motion video data usually consists of a sequence of frames that, when displayed at a particular frame rate, will appear as xe2x80x9creal-timexe2x80x9d motion to a human eye. Given the number of pixels in a typical frame, storing and/or transmitting data corresponding to every pixel in a frame or still or motion video data requires a relatively large amount of computer storage space and/or bandwidth. Additionally, in several motion video applications, processing and displaying a sequence of frames must be performed fast enough to provide real-time motion (typically, between 15-30 frames per second). For example, a system using a frame size of 640xc3x97480 pixels, using 24 bits to represent each pixel, and using a frame rate of 30 frames-per-second would be required to store and/or transmit over 14 megabytes of data per second.
Techniques have been developed to compress the amount of data required to represent images, making it possible for more computing systems to process video data. Compression techniques may compress video data based on either individual pixels (referred to as pixel compression) or blocks or regions of pixels (referred to as block compression) or a combination of both. Typically, pixel compression techniques are relatively easier to implement and provide higher quality than block compression techniques. Although pixel compression techniques generally provide relatively high quality and resolution for a restored image than block compression techniques, pixel compression techniques suffer from lower compression ratios (e.g., large encoding bit rates) because pixel compression techniques consider, encode, transmit, and/or store individual pixels.
One prior art block compression technique is based on compressing motion video data representing pixel information for regions (or blocks) in each frame of a motion video sequence without using information from other frames (referred to as INTRAframe or spatial compression) in the motion video frame sequence.
One type of intraframe compression involves transform coding (e.g., discrete cosine transform). Transform encoded data requires less bits to represent than original data for a frame region, and typically provides relatively high quality results. Unfortunately, transform encoding requires a relatively substantial amount of computation. Thus, transform coding is performed only when necessary (e.g., when another compression technique cannot be performed) in block compression techniques.
Another type of block compression technique typically used in conjunction with intraframe (or transform) encoding for the compression of motion video data is referred to as INTERframe or temporal compression. Typically, one or more regions (blocks) of pixels in one frame will be the same or substantially similar to regions in another frame. The primary aim of temporal compression is to eliminate the repetitive (INTRAframe) encoding and decoding of substantially unchanged regions between successive frames in a sequence of motion video frames. By reducing the amount of intraframe encoding, temporal compression generally saves a relatively large amount of data storage and computation.
When using intraframe compression in conjunction with temporal compression, the first frame in a sequence of frames is intraframe (e.g., DCT) encoded. Once encoded, the first frame becomes the xe2x80x9cbase framexe2x80x9d for encoding the next xe2x80x9cnewxe2x80x9d frame (i.e., the second frame) in the sequence of frames. Thus, the frame currently being encoded is referred to as the new frame, and the frame preceding the new frame is referred to as the base (or old) frame (which is assumed to have been previously been encoded and stored).
To perform intraframe/temporal compression on a new frame, the first steps performed in nearly all temporal compression systems are frame decomposition and pixel classification. One prior art technique initially decomposes the new frame in a sequence of motion video frames into non-overlapping regions (or blocks) of a predetermined size. Next, each pixel in each region of the new frame is compared to a corresponding pixel (i.e., at the same spatial location) in the base frame to determine a xe2x80x9cpixel typexe2x80x9d for each pixel in the new frame. (xe2x80x9cCorrespondingxe2x80x9d region or pixel is used herein to refer to a region or pixel in one frame, e.g., the base frame, that is in the same spatial location of a frame as a region or pixel in another frame, e.g., the new frame.) Based on a set of predetermined temporal difference thresholds, each pixel in the new frame is classified as new (non-static) or old (static).
Based primarily on the classification of pixels, it is determined if each region in the new frame is substantially similar to the corresponding region at the same spatial location in the base frame. If a region in the new frame does not contain at least a predetermined threshold number of new pixels, then that region is considered to be substantially similar to the corresponding region in the base frame and is classified as xe2x80x9cstatic.xe2x80x9d Static regions are encoded by storing data indicating that the region has already been encoded as part of the base frame. The data required to indicate that a region is already encoded is substantially less than the data required to represent an uncompressed or intraframe encoded region. Thus, entire (static) regions do not need to be repeatedly intraframe encoded, stored/transmitted, and decoded, thereby saving a relatively substantial degree of computation and storage.
In addition to classifying regions as xe2x80x9cstaticxe2x80x9d, temporal compression techniques typically also perform motion estimation and compensation. The principle behind motion estimation and compensation is that the best match for a region in a new frame may not be at the same spatial location in the base frame, but may be slightly shifted due to movement of the image(s) in the motion video. By determining that a region in a new frame is substantially the same as another region in the base frame within a predetermined threshold distance of the region in the base frame at the same spatial location, an indication, referred to as a motion compensation (MC) vector, can be generated to indicate the change of location of the region in the new frame relative to the base frame. Thus, a static region can be considered as an MC region with a zero-magnitude MC vector. Since the region in the base frame corresponding to the MC region in the new frame has already been encoded, stored/transmitted, and decoded, the entire MC region does not have to be repeatedly intraframe encoded, stored/transmitted, and decoded. Again, by using an indication (e.g., an MC vector) to identify in a new frame a previously encoded and stored region of a base frame that is substantially the same as a region of a new frame (but spatially displaced), repeated encoding and storage can be avoided, thereby saving a relatively substantial amount of computation and storage expense.
Thus, region(s) in the new frame in the sequence of frames may be temporally encoded if found to be similar (within a predetermined temporal difference threshold) as a region in the already encoded base frame. Once the new frame is encoded, the encoded data from the new frame is used to update the base frame, and the updated base frame then becomes the base frame for the next xe2x80x9cnewxe2x80x9d frame in the sequence of frames as the process is repeated.
By considering regions of pixels and determining temporal differences between such regions, block compression techniques generally provide higher compression ratios than pixel compression techniques since entire regions of pixels are considered and encoded. However, block compression techniques are relatively difficult to implement and typically suffer from some loss in quality.
To achieve higher compression ratios, some pixel compression techniques designate some pixels as xe2x80x9celementaryxe2x80x9d pixels and use the values of the elementary pixels to encode other pixels in proximity. For example, in some television applications, such a technique is used wherein data representing the pixels in alternating even rows of a frame are transmitted and stored, while data representing the pixels in alternating odd rows of the frame are estimated using the data representing the pixels in the even rows. By predicting, rather than encoding, values for some of the rows of pixels, higher compression ratios can be achieved. However, since some pixels cannot be predicted by other pixels, such techniques generally suffer from some loss in quality with respect to a restored image and/or relatively low compression ratios.
Thus, what is desired is a video data compression technique that provides the relative ease of implementation and high quality associated with pixel compression techniques, yet provides the relatively high compression (e.g., low bit rate) that is typically associated with block compression techniques.
What is described is a method and apparatus for compression and decompression of still image and/or motion video data using pixel-by-pixel processing. According to one aspect of the invention, a non-static current pixel in a new frame is compared to each of a set of pixels and/or a combination thereof in a composite frame that may include previously processed pixels of the base frame, processed pixels from the new frame that have been xe2x80x9cplacedxe2x80x9d into the base frame to form a composite frame (or updated/altered base frame), and/or a linear combination thereof (where the linear combinations can include the pixel in the base frame at the same spatial location as the current pixel). According to one aspect of the invention, the set of pixels correspond to an initial set of directions that is then reduced to obtain a reduced set of directions, which includes substantially uniques directions. Based on the comparison, if the current pixel is found to be similar within a threshold to a direction in the reduced set of directions, the current pixel is encoded as directionally estimated. If the current pixel cannot be encoded as static or directionally estimated, then the pixel is encoded as xe2x80x9cnewxe2x80x9d using a delta value that is based on the difference between the current pixel and a reference pixel (e.g., the corresponding pixel at the same spatial location in the base frame, a pixel in proximity to that corresponding pixel, etc.).
According to yet another aspect of the invention, a method and apparatus is described wherein directionally encoded/estimated pixels in a still image and/or a frame of motion video are encoded using an adaptive variable length code (VLC). In one embodiment, a set of Huffman codes is stored and an optimum Huffman code is selected to encode a pixel based on a Huffman code used to encode other pixels in the frame. Thus, more than one VLC may be used to encode pixels in a frame of still or motion video. In another embodiment, VLCs (e.g., Huffman codes) are adaptively generated xe2x80x9con the flyxe2x80x9d for further encoding directionally encoded pixels in a frame. Either or both the number of states and/or statistics associated with the VLCs may be adaptively generated for each or a set of pixels in a frame.
According to another aspect of the invention, compression thresholds and/or pixel processing algorithms are adaptively updated to avoid an unacceptable degradation in performance (e.g., processing time, compression ratio, quality of a restored image, etc.). For example, in certain embodiments of the invention wherein a desired performance parameter is providing a relatively high compression ratio, the threshold used to determine whether the current pixel can be encoded as static is adaptively updated based on the number of static and/or new pixels processed in the new frame. As another example, in certain embodiments of the invention certain pixels are automatically classified (e.g., directionally estimated using a default direction without comparison to other pixels; classified as static or new based on a single comparison; etc.), based on the number of static, estimated, and/or new pixels processed in the new frame.
According to yet another aspect of the invention, the direction in which pixels are processed (e.g., placed into the base frame to form an altered base frame) is varied to provide relatively symmetrical processing of pixels which generally results in an improved compression ratio and/or quality of restored images.
According to yet another aspect of the invention, a method and apparatus is described for decompression of motion video data that has been encoded according to the invention.