Video “codecs” (compressor/decompressor) are used to reduce the data rate required for data communication streams by balancing between image quality, processor requirements (i.e. cost/power consumption), and compression ratio (i.e. resulting data rate). The currently available compression approaches offer a different range of trade-offs, and spawn a plurality of codec profiles, where each profile is optimized to meet the needs of a particular application.
Lossy digital video compression systems operate on digitized video sequences to produce much smaller digital representations. The reconstructed visible result looks much like the original video but may not generally be a perfect match. For these systems, it is important that the information lost in the process correspond to aspects of the video that are not easily seen or not readily noticed by viewers.
A typical digital video compression system operates in a sequence of stages, comprising a transform stage, a quantization stage, and an entropy-coding stage. Some compression systems such as MPEG and other DCT-based codec algorithms add other stages, such as a motion compensation search, etc. 2D and 3D Wavelets are current alternatives to the DCT-based codec algorithms. Wavelets have been highly regarded due to their pleasing image quality and flexible compression ratios, prompting the JPEG committee to adopt a wavelet algorithm for its JPEG2000 still image standard.
When using a wavelet transform as the transform stage in a video compressor, such algorithm operates as a sequence of filter pairs that split the data into high-pass and low-pass components or bands. Standard wavelet transforms operate on the spatial extent of a single image, in 2-dimensional fashion. The two dimensions are handled by combining filters that work horizontally with filters that work vertically. Typically, these alternate in sequence, H-V-H-V, though strict alternation is not necessary. It is known in the art to apply wavelet filters in the temporal direction as well: operating with samples from successive images in time. In addition, wavelet transforms can be applied separately to brightness or luminance (luma) and color-difference or chrominance (chroma) components of the video signal.
One may use a DCT or other non-wavelet spatial transform for spatial 2-D together with a wavelet-type transform in the temporal direction. This mixed 3-D transform serves the same purpose as a 3-D wavelet transform. It is also possible to use a short DCT in the temporal direction for a 3-D DCT transform.
The temporal part of a 3-D wavelet transform typically differs from the spatial part in being much shorter. Typical sizes for the spatial transform are 720 pixels horizontally and 480 pixels vertically; typical sizes for the spatial transform are two, four, eight, or fifteen frames. These temporal lengths are smaller because handling many frames results in long delays in processing, which are undesirable, and requires storing frames while they are processed, which is expensive.
When one looks at a picture or a video sequence to judge its quality, or when one visually compares two pictures or two video sequences, some defects or differences are harder to detect than others. This is a consequence of the human visual system having greater sensitivity for some aspects of what one sees than for others. For instance, one may see very fine details only when they are at high contrast, but can see medium-scale details which are very subtle in contrast. These differences are important for compression. Compression processes are designed to make the differences and errors as unnoticeable as possible. Thus, a compression process may produce good fidelity in the middle sizes of brightness contrast, while allowing more error in fine details.
There is thus a continuing need to exploit various psychophysics opportunities to improve compression algorithms, without significantly sacrificing perceived quality.
The foregoing compression systems are often used in Personal Video Recorders, Digital Video Recorders, Cable Set-Top Boxes, and the like. A common feature of these applications and others is that users have the possibility of pausing the video, keeping a single frame displayed for an extended time as a still image.
It is known in the art to process a video sequence, or other sequence of images, to derive a single image of higher resolution than the input images. This processing is very expensive in computing, however, as it must identify or match moving objects in the scene, camera motion, lighting shifts, and other changes and compensate for each change individually and in combination. Contemporary applications, however, do not presently support such computational extravagance for a simple pause function.
There is thus a continuing need to exploit various psychophysics opportunities to present a paused image that is of substantially higher visual quality than would be produced by simply repeating a frame of decompressed video.