1. Field of Invention
The present invention relates to methods and apparatus for use in multimedia data processing. More particularly, the present invention relates to efficient methods and apparatus for encoding video data using a motion detection process, and decoding the encoded video data.
2. Background
As the use of computers, specifically computers which are linked across a network, increases, the demand for more immediate and complete interaction between a computer user and computers on the network is also increasing. One computer network which is increasingly being used is the Internet, the well-known international computer network that links various military, governmental, educational, nonprofit, industrial and financial institutions, commercial enterprises, and individuals. The Internet provides many multimedia services which are data intensive. Data intensive multimedia services include video services, as for example, multicast video services, on-demand video services such as interactive video services, all of which may be real-time services.
Storing, transmitting, and xe2x80x9cplayingxe2x80x9d multimedia data, e.g., video data, consumes a significant portion of the resources of a computer system. The amount of space occupied by the data is dependent, at least in part, upon the particular characteristics of the data. By way of example, as the resolution and the frame rate requirements of a video increase, the amount of data that is necessary to xe2x80x9cdescribexe2x80x9d the video also increases. Hence, data relating to the video consumes a greater amount of storage space.
One approach to reducing the amount of storage space required for video, or image, data involves compressing the data. The extent to which data is compressed is typically measured in terms of a compression ratio or a bit rate. The compression ratio is generally the number of bits of an input value divided by the number of bits in the representation of that input value in compressed code. It should be appreciated that higher compression ratios are typically preferred over lower compression ratios. The bit rate is the number of bits of compressed data required to properly represent a corresponding input value.
Interdependent compression techniques, which involve using characteristics of one frame in the process of encoding another frame, often involve the calculation of complicated transforms. Although interdependent compression techniques may be widely varied, interdependent compression techniques that are well-known to those skilled in the art include conditional replenishment techniques, as well as variations thereof.
Conditional replenishment is performed either directly on adjacent frames, or on blocks, i.e., contiguous subsets, of adjacent frames, one of which is typically considered to be a xe2x80x9ccurrentxe2x80x9d frame. Conditional replenishment involves a determination of whether a block in the current frame should be encoded, based upon a comparison with a similarly positioned block in the selected adjacent, e.g., either xe2x80x9cpreviousxe2x80x9d or xe2x80x9csubsequent,xe2x80x9d frame. In the event that the difference between the block in the current frame and the similarly positioned block in the selected adjacent frame is less than a specified value, rather than encoding the block in the current frame, the block in the current frame is replenished from the similarly positioned block in the selected adjacent frame.
Quantization methods may be used to convert a high-precision image description into a low-precision image description through a many-to-one mapping. After using techniques such as conditional replenishment to reduce temporal redundancy, techniques such as vector quantization are used to process images in blocks that are represented as vectors. Representative vectors are typically distributed in an n-dimensional space. When n is greater than one, as is well known in the art, there is no natural order to the representative vectors. As such, manipulating the indices to make the compression scaleable is often a complex task.
Further, vector quantization methods that are used to encode images are typically difficult to implement using software alone. While vector quantization methods are more easily implemented using a combination of hardware and software, the use of hardware for real-time vector quantization is impractical, as hardware is often not readily available.
Compression techniques also use colorspace conversions, or transformations, as is well known to those of skill in the art. Such colorspace conversions, which convert data from color space to luminance and chrominance space and vice versa, result in improved perceptual compression. Once colorspace conversions are made to compress data in luminance and chrominance space, in order to decompress the data, transformations must be made from luminance and chrominance space back to color space. Colorspace conversions from luminance and chrominance space to color space typically result in the loss of some color accuracy. In order to compensate for losses in color accuracy, noise is often added to decoded data, or data that has been reconverted into colorspace data. While adding noise to decoded data has been effective in neutralizing losses in color accuracy, the computing overhead associated with adding noise to decoded data is high. As such, compensating for losses in color accuracy often proves to be inefficient.
As complexity that is associated with compression and decompression techniques often reduces the efficiency with which data may be encoded and decoded, implementing compression and decompression techniques which are less complex, but still maintain an acceptable level of quality, are desirable. Further, implementing accurate and efficient motion detection methods which may be used to reduce the amount of data which is actually encoded and, therefore, decoded, is also desired. Specifically, in view of the foregoing, there are desired improved apparatus and methods for detecting motion such that video data may be efficiently compressed and decompressed.
The present invention relates, in one aspect, to a method for processing video data that is divided into frames. The video data includes a current frame, which has an uncompressed current block, and an adjacent frame, which has an adjacent block. The method for processing video data involves obtaining the current block and the adjacent block, and calculating a distance between the current block and the adjacent block. If the distance between the current block and the adjacent block is determined to be unacceptable, then the current block is compressed using adaptive compression.
In one embodiment, the method for processing video data further involves sending a header for motion detection that is arranged to indicate if the current block has been compressed. In such an embodiment, the header is used in conjunction with a binary tree or a quad tree to obtain compressed bits. In another embodiment, obtaining the current block involves segmenting the current frame.
In accordance with another aspect of the present invention, a method for performing a colorspace conversion on bits associated with video data involves obtaining bits which identify a codebook index for a component, and using the codebook index to obtain pixel representations for the component. The pixel representations are then dithered, and the dithered pixel representations are used to obtain an index for the component. In one embodiment, the dithered pixel representations are clipped.
In accordance with still another aspect of the present invention, a computer-implemented image processing system includes an encoder arranged to encode video data using adaptive compression techniques, and a decoder arranged to accept the encoded video data and to decode the encoded video data. The decoder includes a table-based N-stage Huffman decoder and a colorspace converter. In one embodiment, the colorspace converter is a table-based luminance and chrominance space to colorspace converter. In another embodiment, a network delivery system is arranged to channel the encoded video data from the encoder to the decoder.