Raw video files consume huge amounts of space. For example, a High Definition(HD) movie with 60 frames per second(fps), frame resolution of 1920×1080 pixels, color depth 3, and 8 bits per color, consumes:                1,920*1,080*60*3=373,248,000 Bytes per second.        
And two hours of such movie would consume:                373,248,000*7,200=2,687,385,600,000 Bytes≈3 Tera bytes(Tbytes).        
To store the movie on regular DVD disks, i.e. 4.7 Giga bytes(Gbytes) disks, we need:                ≈2,687/4.7≈600 DVD disks.        
And to transmit the movie over the Internet, say over fast 100 Mbps channels, we need:                ≈2,687,386*8/100≈2,149,908 seconds≈60 hours.        
Video compression is the art of reducing the video size without affecting the perceived quality.
Video content is not always taken with the best equipment and the best photo shooters. In such cases digital image processing, also known as video enhancement, can substantially improve the visible quality of the video, and help the video compression process. Some of the more known methods for video enhancements use video preprocessing tools such as the following:
De-interlacing Interlaced movie can be problematic when recording fast moving objects. The moving object can be in one place in the “even” picture, and in another place in the “odd” one, yielding a “stripped” picture which is very disturbing.
De-blocking Block-like artifacts are the side effect of the current MPEG's low-quality highly compressed videos. De-blocking greatly increases the quality of such videos.
Sharpening emphasizes texture and detail, and is critical when post-processing most digital images. An “unsharp mask” is actually used to sharpen an image.
De-noising Some degree of noise is always present in any electronic device that transmits or receives a “signal”. For television this signal is the broadcast data transmitted over cable or received at the antenna; for digital cameras, the signal is the light which hits the camera sensor. Video De-noising is the process of removing noise from a video signal.
Stabilization is used to reduce blurring associated with the motion of camera. Specifically, it compensates for pan and tilt of a camera or other imaging devices. With video cameras, camera shake causes visible frame-to-frame jitter in the recorded video.
Camera Calibration is important in order to get stable and reliable images. Cameras that operate out of sync or are imprecisely adjusted can create blurry or confused images.
Such tools can greatly improve the video quality and help the compression process. Here, we ignore this issue and assumes that the video has already been preprocessed as required. See Ref. [1] for more details.
A digital video consists of multiple streams such as video, audio, and control, that are stored together in the same container file. For example, common containers formats are: AVI (Audio Video Interlaced), WMV (Windows Media Video), FLV (Flash Video), MOV (Apple Quick Time Movie). The video stream itself is usually independent of the other streams, or of the container type, and can be represented in many different formats. A media player, such as Apple iTunes, and Microsoft Windows Media Player, displays the video on the screen, using the corresponding Codec (Encoder/Decoder) software.
The displayed video is usually represented in the raw RGB color space format because the human visual system works in a similar way, i.e., the human eye color vision is based on red, green and blue color sensors. The raw RGB file 100 is schematically depicted in FIG. 1, comprising a header section 120 followed by frames 130. The Header 120 contains the video parameters such as: n—number of rows, m—number of columns, and N—number of frames. A frame 130 contains n*m pixel values, each a triplet for the R, G and B.
The raw YUV color space format is another very useful format for video representation. Here, Y corresponds to the black and white representation of the video, and U and V to the added color differences. There are many similar formulas for converting RGB to YUV and vice versa. One of them, see Ref. [2], is exemplified in FIG. 2, where the RGB to YUV transforming formula is given in unit 210, and the YUV to RGB transforming formula is given in unit 220. The raw YUV file 300 is schematically depicted in FIG. 3, comprising a header section 310 as in unit 120 of FIG. 1, followed by the Y frames 320, the U frames 330, and the V frames 340. Typical frames for the Y, U and V components are shown. In what follows we consider only the video stream part of the container file, and without loss of generality (w.l.g.), we assume a YUV color space representation.
A camera may change its angle of view many times during the movie. These changes of scenes, also called the movie's cuts, are distinguished by their shape and contents, see Ref. [3]. In terms of compression this means that we have little redundancy between the cuts.
The cut file 400 is schematically depicted in FIG. 4, comprising a header section 410 followed by the cuts 420. The header is as follows:
n is the number of rows, m is the number of columns,
N is the number of frames, and M is the number of cuts.
Each cut of the file has the same structure as the YUV file format given in unit 300 of FIG. 3. For simplicity, we will proceed to consider from now on only one such component of each such cut. A generalization to all components is straightforward.
Wavelets and multiwavelets, see Ref. [4], are important mathematical tools that we use in the applications that follow. Classical discrete wavelet transform (DWT) filters are depicted in FIG. 5; a pair of low pass and high pass analysis filters are depicted in unit 510, and a pair of low pass and high pass synthesis filters are depicted in unit 520. For example, the one dimensional Haar transform is depicted in unit 530.
In general, we have m>1 filters, as depicted in FIG. 6; the analysis filters are depicted in unit 610, and the synthesis filters in unit 620. For example, a 2D Haar transform is depicted in unit 630. More generally, the filters may refer to the discrete multiwavelet transform (DMWT).
The lattice of integers ?n is the set of n-tuples of integers in the Euclidean space ?n. A frame can be represented as a rectangular grid on the lattice ?2, and a video as a cubic grid on ?3. A subset of a lattice, which is itself a lattice is called a sub-lattice. Examples of sub-lattices of ?2 are given in FIG. 7. The Quincunx sub-lattices are depicted in unit 710. The white circled points correspond to the even sub-lattice, and the dark circled points to the odd sub-lattice. The Dyadic sub-lattices are similarly depicted in unit 720. The Quincunx sub-lattices are determined by the dilation matrix of unit 715, and the Dyadic sub-lattices by the dilation matrix of unit 725. The number of sub-lattices is determined by the determinant of the corresponding dilation matrix, 2 in the Quincunx case, and 4 in the Dyadic case. Down-sampling refers to the process of extracting a sub-lattice from a given lattice. For example, we display a dyadic down sampling in FIG. 8. The input signal is given in unit 810, a temporal down sampling in unit 820, a spatial down sampling in unit 830, and a combined spatial and temporal down sampling in unit 840.