1. Field of Invention
The present invention relates to methods and apparatus for use in multimedia data processing. More particularly, the present invention relates to efficient methods and apparatus for encoding video data using motion estimation processes, and decoding the encoded video data.
2. Background
As the use of computers, specifically computers which are linked across a network, increases, the demand for more immediate and complete interaction between a computer user and computers on the network is al so increasing. One computer network which is increasingly being used is the Internet, the well-known international computer network that links various military, government al, educational, nonprofit, industrial and financial institutions, commercial enterprises, and individuals. The Internet provides many multimedia services which are data intensive. Data intensive multimedia services include video services, as for example, multicast video services, on-demand video services such as interactive video services, all of which may be real-time services.
Storing, transmitting, and xe2x80x9cplayingxe2x80x9d multimedia data, e.g., video data, consumes a significant portion of the resources of a computer system. The amount of space occupied by the data is dependent, at least in part, upon the particular characteristics of the data. By way of example, as the resolution and the frame rate requirements of a video increase, the amount of data that is necessary to xe2x80x9cdescribexe2x80x9d the video also increases. Hence, data relating to the video consumes a greater amount of storage space.
One approach to reducing the amount of storage space required for video, or image, data involves compressing the data. The extent to which data is compressed is typically measured in terms of a compression ratio or a bit rate. The compression ratio is generally the number of bits of an input value divided by the number of bits in the representation of that input value in compressed code. It should be appreciated that higher compression ratios are typically preferred over lower compression ratios. The bit rate is the number of bits of compressed data required to properly represent a corresponding input value.
Interdependent compression techniques, which involve using characteristics of one frame in the process of encoding another frame, often involve the calculation of complicated transforms. Although interdependent compression techniques may be widely varied, interdependent compression techniques that are well-known to those skilled in the art include motion compensation and conditional replenishment techniques, as well as variations thereof.
Motion compensation is performed either directly on adjacent frames, or on blocks, i.e., contiguous subsets, of adjacent frames, one of which is typically considered to be a xe2x80x9ccurrentxe2x80x9d frame. When motion compensation is performed on blocks, the block in the current frame is compared with comparably sized blocks in a selected adjacent frame, which may either be a xe2x80x9cpreviousxe2x80x9d frame or a xe2x80x9csubsequentxe2x80x9d frame, relative to the current frame. If a block in the selected adjacent frame which matches the block in the current frame is identified, then a motion vector which indicates both the direction and the distance in the motion of the block in the selected adjacent frame is created, in part, to characterize the motion of the block.
Conditional replenishment, which may be considered to be a specific case of motion compensation, is also performed on blocks of a current frame and a frame that is adjacent to the current frame. Conditional replenishment involves a determination of whether a block in the current frame should be encoded, based upon a comparison with a similarly positioned block in the selected adjacent, e.g., either previous or subsequent, frame. In the event that the difference between the block in the current frame and the similarly positioned block in the selected adjacent frame is less than a specified value, rather than encoding the block in the current frame, the block in the current frame is replenished from the similarly positioned block in the selected adjacent frame.
Motion compensation techniques are often complex and, therefore, although the motion compensation techniques may be effective in the process of compressing frames, motion compensation techniques are often inefficient. That is, although the number of blocks and frames which are compressed may be reduced using motion compensation techniques, the actual motion compensation techniques are typically complex and, hence, inefficient.
Quantization methods may be used to convert a high-precision image description into a low-precision image description through a many-to-one mapping. After using techniques such as motion compensation and conditional replenishment to reduce temporal redundancy, techniques such as vector quantization are used to process images in blocks that are represented as vectors. Representative vectors are typically distributed in an n-dimensional space. When n is greater than one, as is well known in the art, there is no natural order to the representative vectors. As such, manipulating the indices to make the compression scaleable is often a and complex task.
Further, vector quantization methods that are used to encode images are typically difficult to implement using software alone. While vector quantization methods are more easily implemented using a combination of hardware and software, the use of hardware for real-time vector quantization is impractical, as hardware is often not readily available.
Compression techniques also use colorspace conversions, or transformations, as is well known to those of skill in the art. Such colorspace conversions, which convert data from color space to luminance and chrominance space and vice versa, result in improved perceptual compression. Once colorspace conversions are made to compress data in luminance and chrominance space, in order to decompress the data, transformations must be made from luminance and chrominance space back to color space. Colorspace conversions from luminance and chrominance space to color space typically result in the loss of some color accuracy. In order to compensate for losses in color accuracy, noise is often added to decoded data, or data that has been reconverted into colorspace data. While adding noise to decoded data has been effective in neutralizing losses in color accuracy, the computing overhead associated with adding noise to decoded data is high. As such, compensating for losses in color accuracy often proves to be inefficient.
As complexity that is associated with compression and decompression techniques often reduces the efficiency with which data may be encoded and decoded, implementing compression and decompression techniques which are less complex, but still maintain an acceptable level of quality, are desirable. In addition, the ability to accurately and efficiently determine if a block in a current frame may be estimated using a block in a previous frame is desirable, as such an ability may reduce both the amount of encoding and decoding which is performed, as well as the complexity associated with encoding and decoding processes. Therefore, in view of the foregoing, there are desired improved apparatus and methods for estimating motion such that video data may be efficiently compressed and decompressed.
The present invention relates, in one aspect, to a method for processing video data that is divided into frames. The video data includes a current frame, which has an associated current macroblock, and an adjacent frame, which also has an associated macroblock. The method for processing video data involves obtaining an uncompressed current block that is a part of the current macroblock and an adjacent block that is part of the adjacent macroblock, and calculating a distance between the uncompressed current block and the adjacent block. It is determined whether the distance between the uncompressed current block and the adjacent block is acceptable. If the distance is unacceptable, then the motion between the uncompressed current block and the adjacent block is estimated, and the uncompressed current block is adaptively compressed.
In one embodiment, estimating the motion between the uncompressed current block and the adjacent block involves calculating the residual between the uncompressed current block and the adjacent block. In such an embodiment, the residual is adaptively compressed, and the entire current macroblock is compressed using a first compression technique. In another embodiment, the current macroblock is compressed using a second compression technique which is arranged to utilize the adaptively compressed current block.
In another aspect of the present invention, a method for processing video data which includes a first block and an encoded block representation, as well as additional bits associated with the block representation, involves decoding the additional bits using a table-based N-stage Huffman decoder and performing a transformation to convert the block representation from luminance and chrominance space to color space. In one embodiment, the block representation is a residual which is a pixel-by-pixel difference between the first block and a second block. In such an embodiment, the block representation is decoded and added to the first block. In another embodiment, at least some of the additional bits represent a motion vector, and adding the decoded block representation to the first block involves adding the motion vector to the first block.