Digital video is seldom transmitted or stored in its raw, original form. Rather, the digital video data is compressed in some fashion. Compression of video is possible because there are, depending on the type of footage, various amounts of redundancy present in the video signal. There exists spatial redundancy because, within the video frames, the signal does not change much between most pixels (picture elements of the video frame); there exists temporal redundancy because the video signal does not change much between most frames. There also exists perceptual redundancy because the pixel value fluctuations within frames and between frames contain more information than can be perceived by the human eye.
There are many video compression techniques, among which, such as the MPEG-1 and MPEG-2 standards, that try to exploit these redundancies in order to compress a video signal as much as possible while still maintaining the visual content of the video as well as possible. Spatial redundancy is exploited by transmitting the coefficients of the DCT transform of 8×8 image blocks. Temporal redundancy is exploited by transmitting only differences between subsequent frames, where these differences are expressed using motion compensation vectors. Perceptual redundancy is exploited by limiting the color information in the signal.
These compression standards support high resolution and high frame rate video. Lower-bandwidth video compression techniques (like H.263, H.320, and H.323) also exist, but these usually support only low resolution images (QSIF) at low frame rates (2 fps). Such compression schemes are usually designed either as general-purpose systems for any image type, or specifically as video conferencing systems.
A more recent compression standard, which is still under development, is MPEG-4. Where MPEG-1 and MPEG-2 do not take into consideration the visual content of the individual video frames, MPEG-4 does. Rather than basing the compression on image blocks, the compression is based on image regions that actually may correspond to semantically meaningful area of the 3D scene. For example, a textured region can be compressed as a representation of its boundary plus parameters that describe the texture, possibly with a residual image as well. Although MPEG4 does not prescribe how the regions are to be extracted, computer vision techniques are often used. MPEG-4 also has provisions for very high-level compression of moving faces. A general geometric face model is predefined with a number of control points. The encoder just has to set the initial location of these points and provide trajectories for them as the video progresses. It is up to the decoder then to take care of reconstructing and displaying a suitable face based on this parameter set.
A compressor and corresponding decompressor pair that can code a signal into a compressed form and then can decode a signal back into its original format is called a codec. The compression can either be lossless, in which case the decoded signal is equal to the original signal, or lossy, in which case the decoded signal is merely a “good” approximation of the original signal. In the latter case, information is lost between the original and reconstructed signal, but a good compression algirithm attempts to ensure the best possible decoded signal (usually from a human perceptual standpoint) within a given bit rate. Lossless techniques could also be applied to an image or video, but generally do not yield enough data reduction to be very useful (typically compression ratios between 1.2 and 2.5, whereas MPEG-1 usually runs at 30 to 50).
The following reference describes examples of the state of the prior art in compression technology:                B. G. Haskell, A. P. Puri, and A. N. Netravali,        Digital Video: An Introduction to MPEG-2, Chapman & Hall: New York, 1997        
Chapter 1 pages 1-13, introduces compression, standards for video conferencing (H.320), MPEG1 and MPEG2. The low bit-rate standard, H.263, is handled on pages 370-382. MPEG4 is introduced on pages 387-388. These references are incorporated herein in their entirety.
The compression techniques proposed herein require computer vision techniques. The following computer vision techniques are especially relevant.
Edge detection: These are techniques to identify sharp discontinuities in the intensity profile of images. Edge detectors are operators that compute differences between pairs of neighboring pixels. High responses to these operators are the identified as edge pixels. Edge maps can be computed in a single scan through the image. Examples of edge detection are the Gradient- and Laplacian-type edge finders and edge templates such as Sobel.
Region finding: This is a class of techniques that identify areas of continuity within an image (in a sense, the opposite of edge detection). The areas that are to be detected are constant in some image property. This property can be intesity, color, texture, or some combination of these. Using connected components techniques, regions can be computed in a single scan. Clustering approaches have also been successfully. An example here is the detection of hands or faces in frames by finding regions with flesh tone.
Background subtraction: This is a method where two images are used to find image regions corresponding to objects. A first image is acquired without the objects present, then a second image with the objects. Subtracting the second image from the first and ignoring regions near zero results in a segmented image of the objects.
Normalized correlation: This is a technique for comparing two image patches Q1 and Q2. The normalized correlation at some translation T is defined as:NC=[E(Q1Q2)−E(Q1)E(Q2)]/Sigma(Q1)Sigma(Q2)with E(.) the expectation and Sigma(.) the variance. High values here indicate that the patches are very similar, despite possible differences in lighting conditions.
Normalized correlation and other computer vision techniques, are described more fully in:                D. Ballard and C. Brown, Computer Vision, Prentice-Hall: New Jersey, 1982.        
Gradient- and Laplacian-type edge finders and edge templates can be found on pages 75-80; pages 149-155 describe region finding and connected components techniques; background subtraction on pages 72-73; and normalized correlation can be found on pages 68-70. These references are incorparted herein in their entirety.
Some of the above techniques are also used to process the frames in order to compute MPEG4 compression. However, MPEG4 (and MPEG1-2) coding techniques are, in general proprietary and hence descriptions of the actual techniques used are not available. Yet all that is important from a functional standpoint is that it is possible for decoders which adhere to the standard to decode the resulting signal.