The invention relates to compressing video frames.
Referring to FIG. 1, a digital imaging system 5 may include a digital camera 12 that electrically captures a digitized representation, or a pixel image, of an optical image 11. The pixel image typically is represented by a frame of data, and each portion or pixel, of the pixel image is represented by one or more bytes of the frame. Although the camera 12 may transmit the original, decompressed frame to a computer 14 via a bus 15 (a serial bus, for example) the camera 12 might first compress the frame before transmission due to the limited available bandwidth of the bus 15.
As an example of the bandwidth limitations, the bus 15 may be a Universal Serial Bus (USB), and the camera 12 may use an isochronous communication channel of the bus 15 to communicate with the computer 14. However, other devices (an infrared transceiver 16, for example, that communicates with an infrared keyboard 7 and a mouse 8, as examples) may also use the bus 15. Therefore, because multiple devices may use the bus 15, the camera 12 typically does not reserve all of the available bandwidth of the bus 15 but rather, reserves a reasonable portion (a bandwidth of 5 Mbits/sec., for example) of the available bandwidth. Otherwise, without this self-imposed restriction, the remaining available bandwidth of the bus 15 may be insufficient to support the communications required by the other bus devices.
However, the bandwidth reserved by the camera 12 may be insufficient to transmit decompressed frames. As an example, for video, the camera 12 may transmit thirty frames every second. In this manner, if the camera 12 transmits a frame of 352 columns by 288 rows thirty times every second and represents each pixel by eight bits, then the required bandwidth is 24,330,240 bits/second. Not only does this well exceed the limited bandwidth reserved for the camera 12, this requirement might also exceed the total available bandwidth of the bus 15. Thus, compression might be required to reduce the required bandwidth.
The compression of each frame by the camera 12 may require spatially filtering. In this manner, the pixel intensities of the image typically spatially vary across the image, and the rate at which these intensities vary is called the spatial frequency, which also varies across the image. As an example, the boundaries of objects generally introduce high spatial frequencies because the levels of the pixel intensities change rapidly near object boundaries.
In a technique called a wavelet transformation, the camera 12 may spatially filter the image in different directions to produce frequency sub-band images, and the camera 12 may then compress the data associated with the sub-band images. The transformation of the original pixel image into the frequency sub-band images typically includes spatially filtering the original pixel image (and thus, the associated data) in both vertical and horizontal directions. For example, referring to FIG. 2, a 9-7 bi-orthogonal spline filter may be used to filter an original pixel image 18 (having a resolution of 1280 columns by 960 rows, for example) to produce four frequency sub-band images 19 (each having a resolution of 640 columns by 480 rows, for example). Thus, the sub-band images 19 represent the image 18 after being spatially filtered along the vertical and horizontal directions.
Referring to FIG. 3, to compress the data associated with the original frame, the camera 12 may first transform (block 2) the corresponding pixel image into frequency sub-band images. After the transformation, the camera 12 may quantize (block 3) the data associated with the frequency sub-band images 19 to reduce the bit precision (and size) of the data and increase the number of zeros in the data. For example, the camera 12 might truncate the four least significant bits of each byte of the data. Therefore, for example, instead of each intensity value being represented by eight bits, each intensity value may instead be represented by four bits.
To complete the compression, the camera 12 may entropy encode (block 4) the quantized data. In entropy encoding, redundant data patterns are consolidated. Therefore, because the quantization increases the number of zeros in the data, the quantization typically enhances the effectiveness of the entropy encoding. As an example, one type of entropy encoding, called Huffman encoding, uses variable length codes to represent the data. The shorter codes are used to represent patterns of the data that occur more frequently, and the longer codes are used to represent patterns of the data that occur less frequently. By using this scheme, the total size of the data is reduced.
Once the computer 14 receives the compressed frame, the computer 14 may then follow the decompression steps described above in a reverse order in an attempt to reconstruct the original pixel image. Referring to FIG. 4, the computer 14 may perform (block 20) an inverse entropy function to reconstruct the quantized data for the sub-band images. The computer 14 may then perform (block 22) an inverse quantization function to reconstruct the data for the original sub-band images 19 and then perform (block 24) an inverse transform function on this data to reconstruct the original pixel image. Unfortunately, the precision lost by the quantization is not recoverable. For example, if in the quantization eight bits of data are quantized to produce four bits of data, the lost four bits are not recoverable when inverse quantization is performed. As a result, one problem with the above-described compression technique is that the quantization may introduce a large change in the intensity of a pixel after compression for a relatively small change in intensity of the pixel before compression.
For example, before quantization, a spatial intensity 6 (see FIG. 5) of pixels of a particular row of a frequency sub-band image may be close to a quantization threshold level (called I1). For video, the intensity 6 may vary slightly from frame to frame due to artifacts, or noise, present in the system, and these changes may be amplified by the quantization. The amplified artifacts (an effect called scintillation) may cause otherwise stationary objects to appear to move in the reconstructed video. The human eye may be quite sensitive to this motion, and thus, the perceived quality of the video may be degraded.
Thus, there is a continuing need for a digital imaging system that reduces the amplification of artifacts in the compression/decompression process.