This invention relates to video compression devices and methods and, in particular, to a method and apparatus for significantly compressing video image information in a digital format without significantly degrading image quality while providing a substantially constant data output rate.
Video images are comprised of numerous individual picture elements, typically referred to as pixels. The pixels of a video image are tiny color, and/or black and white dots, closely spaced on an electronic display device such that when the aggregate of the pixels is viewed from a distance they are virtually indistinguishable from each other and appear instead to be an image. For example, full-resolution NTSC (National Television Systems Committee) frames are comprised of thousands of pixels. The pixels forming an NTSC video frame can be represented by binary data values that numerically specify the luma (Y) and chroma red (Cr) and chroma blue (Cb) content of the image. A full resolution NTSC video frame generally comprises 720 by 480 bytes of digital data for the luma content of such an image. The chroma red and the chroma blue content of an NTSC frame each comprise 360 by 480 bytes of data. At a frame rate of 30 frames-per-second (fps), a wirelessly transmitted real-time NTSC video signal requires a substantial amount (approximately 166 Mbits/sec.) of data to be transferred between a transmitter and receiver.
It is well known that video data is preferably compressed in order to minimize the bandwidth required to transmit the video data. This is typically accomplished by disposing of data that has little or no perceivable information content with respect to an image or images being transferred. One class of methods for compressing video data incorporates the use of wavelet transforms. For example, U.S. Pat. Nos. 5,315,670, 5,412,741, 5,321,776, 5,315,670 and 5,563,960, issued to Shapiro, describe various techniques of video compression based on wavelet transforms. Briefly, a wavelet transform of video data, resulting in wavelet coefficients, provides a method for encoding and decoding video images based upon a conversion of video pixel information into a wavelet domain in which frequency and spatial characteristics are maintained. Additionally, Shapiro recognizes the benefits of compression schemes that provide an embedded stream as output. An embedded stream comprises all lower data rates at the beginning of the stream. That is, as the compression scheme compresses data, that data comprising the greatest or most significant informational content is output first. As the compression continues, additional information content is added to the output stream, further refining the overall quality of the compressed video data. This allows the compression encoder to stop encoding when any target data rate for the output stream has been reached, thereby allowing for a constant output data rate. Similarly, a compression decoder that is decompressing the output stream can stop decoding at any point, resulting in an image having quality that would have been produced at the data rate of the truncated stream. These qualities of embedded streams can simplify overall system design.
While Shapiro (see above) and others have disclosed wavelet transform techniques for compressing video data, many of these prior art techniques have not stressed video transmission speed. Rather, many prior art techniques stress reproduction accuracy, and all such prior art techniques have been computationally complex requiring a significant amount of processing power. As a result, these prior art compression schemes are not typically suitable for use in a consumer product, the cost of which must be as low as possible. A computationally efficient video compression technique capable of economical use in consumer applications would be an improvement over the prior art. Furthermore, such a technique should provide an output as an embedded stream.
Generally, the present invention provides a computationally simple technique for providing compressed video data as an embedded stream. This is achieved by hierarchically recognizing blocks of data that may be logically reduced to highly compact representations. In one embodiment of the present invention, data elements are logically divided into blocks. In a bit-wise fashion, each block is inspected to determine whether the data elements for that block may be represented in a highly compact format. If so, then a single bit representative of the entire block at a given bit position is output. If a given block may not be represented in this manner, it is sub-divided into blocks having smaller dimensions. This process of identifying suitable blocks and sub-dividing is recursively repeated as necessary until minimum block dimensions are reached.
In another embodiment of the present invention, a plurality of ascending tables are constructed by repetitively forming tables of reduced data elements by logically OR""ing individual data elements from lower level tables. In this manner, successively higher level tables are representative of larger blocks of data elements. Recursively descending from a highest level table and in a bit-wise fashion, the plurality of ascending tables are traversed; based on the reduced data elements, blocks of data are identified that are susceptible to the highly compact format.
The present invention beneficially uses wavelet transforms to provide video data that may be advantageously compressed in accordance with the present invention. After initial pixel conditioning, wavelet coefficients are calculated and are expressed as multi-bit binary values. In an embodiment of the present invention, the wavelet coefficients are expressed in signed magnitude format. The wavelet coefficients are stored in a two-dimensional matrix and are hierarchically clustered together by frequency components of the image they represent. The wavelet coefficients are then supplied as input to compression processing, as disclosed herein.
In contrast with other prior art systems, the invention disclosed and claimed herein provides a computationally efficient means by which video data can be highly compressed without unacceptable picture degradation, while still providing the benefits of embedded output streams. Furthermore, although the present invention is specifically described in terms of its application to video data, the principles taught herein may be beneficially applied to many forms of data.