1. Field of the Invention
The invention relates to video compression techniques.
2. Description of the Related Art
A video information stream comprises of a sequence of video frames. Each of the video frames can be considered as a still image. The video frames are represented in a digital system as an array of pixels. The pixels comprise of luminance or light intensity and chrominance or color information. The light and color information is stored in a memory of the digital system. For each of the pixels some bits are reserved. From a programming point of view each video frame can be considered as a two-dimensional data type. Note that fields from an interlaced video sequence can also be considered as video frames.
In principle when the video information stream must be transmitted between two digital systems, this can be realized by sending the video frames sequentially in time, for instance by sending pixels and thus bits sequentially in time.
There exist however more elaborated transmission schemes enabling faster and more reliable communication between two the digital systems. The transmission schemes are based on encoding the video information stream in the transmitting digital system and decoding the encoded video information stream in the receiving digital system. Note that the same principles can be exploited for storage purposes.
During encoding, the original video information stream is transformed into another digital representation. The digital representation is then transmitted. The goal of decoding is to reconstruct the original video information stream from the digital representation completely when lossless compression is used or approximately when lossy compression is used.
The encoding is based on the fact that temporal nearby video frames are often quite similar up to some motion. The arrays of pixels of temporal nearby video frames often contain the same luminance and chrominance information except that the coordinate places or pixel positions of the information in the arrays are shifted or displaced. Shifting or displacement in position as function of time defines a motion. The motion is characterized by a motion vector. Note that although the described similarity up to some motion of video frames appears only in ideal cases, it forms the basis of encoding based on a translational motion model. The transformation between a video frame and a temporal nearby video frame can also be a more complicated transformation. Such a complicated transformation can form the basis of a more complicated encoding method.
Encoding of the video information stream is done by performing encoding of the video frames of the sequence with respect to other video frames of the sequence. The other video frames are denoted reference video frames.
The encoding is in principle based on motion estimation of the motion between a video frame under consideration and a reference video frame. The motion estimation defines a motion vector. Motion estimation is based on calculating an error norm which is determined by a norm of the difference between two video frames. Often the sum of absolute differences of pixel values of pixels of the reference frame and the video frame under consideration is used as error norm. Other error norms can also be used. In the prior art essentially all error norms are based on differences between pixel values of pixels of both frames.
After the motion is estimated, motion compensation is performed. The motion compensation comprises of constructing a new motion compensated video frame from the reference video frame by applying the motion, defined by the motion vector. The motion compensated video frame comprises of the pixels of the reference video frame but located at different coordinate places. The motion compensated video frame can then be subtracted from the video frame under consideration. This results in an error video frame. Due to the temporal relation between the video frames, the error video frame will contain less information. This error video frame and the motion vectors are then transmitted, possibly after some additional coding. The motion estimation, motion compensation, subtraction and additional coding is further denoted by interframe encoding.
The interframe encoding can be limited to a part of a video frame. The interframe encoding is also not performed on the video frame as a whole but on pieces of the video frame. The video frame is divided into non-overlapping or even overlapping blocks. The blocks define a region in the video frame. The blocks can be of arbitrary shape. The blocks can be rectangular, triangular, hexagonal or any other shape, regular and irregular.
The blocks are thus also arrays of pixels but of smaller size than the video frame array. The interframe encoding operations are then performed on essentially all the blocks of the video frame. As the encoding of a video frame is performed with respect to a reference video frame, implicitly a relation is defined between the blocks of the video frames under consideration and the blocks of the reference video frame. Indeed the calculation of the sum of absolute differences or any other error norm will only be performed for a block of a video frame under consideration and blocks of the reference video frame which are nearby located. These locations are defined by the maximum length of the motion vector. These locations define a search area. These locations are defined by the minimum and maximum component values of the motion vector. In case of a pure translational motion model the minimum and maximum component values correspond to the search ranges. The resulting locations define the search area in the reference video frame.
Wavelets have proven to be successful in compressing still images. Compared to the classical DCT approach (JPEG), the wavelet-based compression schemes offer the advantage of a much better image quality obtained at very high compression ratios. Still image compression via the wavelet transform leads to graceful image degradation at increased compression ratios, and does not suffer from the annoying blocking artefacts, which are typical for JPEG at very low bit rates. Another advantage of wavelets over DCT is the inherent multiresolution nature of the transformation, so that progressive transmission based on scalability in quality and/or resolution of images comes in a natural way. These advantages can be efficiently exploited for sequences of video frames, especially in very low bit rate applications that can benefit from the improved image quality. Moreover, the progressive transmission capability is important to support variable channel bandwidths.
A straightforward approach to build a wavelet-based video codec, is to replace the DCT in a classical video coder by the discrete wavelet transform [Dufaux F., Moccagatta I. and Kunt M. xe2x80x9cMotion-Compensated Generic Coding of Video Based on a Multiresolution Data Structurexe2x80x9d. Optical Engineering, 32(7):1559-1570, 1993.][Martucci S., Sodagar I. and Zhang Y.-Q. xe2x80x9cA Zerotree Wavelet Video Coderxe2x80x9d. IEEE Trans. on Circ. and Syst. for Video Techn., 7(1):109-118, 1997.]. A drawback of this implementation is that for interframe encoding the wavelet transform is applied to the complete error video frame, which contains blocking artefacts. These artificial discontinuities, introduced in the motion vector field, lead to undesirable high-frequency subband coefficients that reduce the compression efficiency.
To avoid this limitation, the discrete wavelet transform is taken out of the temporal prediction loop which results in the video encoder depicted in FIG. 1 [Zhang Y.-Q. and Zafar S. xe2x80x9cMotion-Compensated Wavelet Transform Coding for Color Video Compressionxe2x80x9d. IEEE Trans. on Circ. and Syst. Video Techn., 2(3):285-296, 1992.]. Before the motion (ME) estimation and motion compensation (MC), the discrete wavelet transform (DWT) is calculated on the video frames, obtaining for each of the video frames an average subimage and detail subimages (FIG. 12).
Both the motion estimation and compensation are performed in the wavelet transform domain, i.e. in the average subimage of the highest level and in the detail subimages. This is feasible since the wavelet subimages contain not only frequency information but also spatial information, which is not the case for the DCT. The advantages of such a codec are: (1) the blocking artefacts due to the motion vector (MV) field are no longer transformed to the wavelet transform domain and (2) no inverse discrete wavelet transform (IDWT) is needed, so that from an implementation point of view, both hard- and software, the encoder can be simplified.
However, difficulties are encountered with this approach, because in general the discrete wavelet transform is not shift invariant [Cafforio C., Guaragnella C. and Picco R. xe2x80x9cMotion Compensation and Multiresolution Codingxe2x80x9d. Signal Proc.: Image Communication, 6:123-142, 1994.], due to the subsampled nature of the transform. This implies that shifts in the spatial domain do not just produce shifts in the wavelet transform domain subimages, but change the pixel values of the coefficients in the subimages as well. Motion estimation and compensation are not as simple as in the spatial domain, where blocks are taken out of the reference video frame and are used to predict the considered video frame. In the wavelet transformed video frames the required blocks are not directly available, therefore one cannot use the same techniques as in the spatial domain. However, there is an exception if the shifts in the spatial domain are multiples of the sampling period. A dyadic wavelet transform is completely shift invariant if the spatial domain shift has the form dxc2x72J, dxcex5Z, where J denotes the number of decomposition levels (see FIG. 12). In this case, the same motion estimation and compensation approaches can be used in the wavelet transform domain and the spatial domain.
Some methods have already been introduced in [Mandal M. K., Chan E. and Panchanathan S. xe2x80x9cMultiresolution Motion Estimation Techniques for Video Compressionxe2x80x9d (preprint)], [Zhang Y.- Q. and Zafar S. xe2x80x9cMotion-Compensated Wavelet Transform Coding for Color Video Compressionxe2x80x9d. IEEE Trans. on Circ. and Syst. Video Techn., 2(3):285-296, 1992]. They perform a hierarchical motion estimation in the wavelet detail subimages by using the mean absolute difference error (MAE), or the mean square difference error (MSE) as an error norm of the difference between two video frames or video frame blocks. To obtain the wavelet error video frame, the new motion compensated wavelet video frame is subtracted from the considered wavelet video frame (see FIG. 1), just as one would do in the spatial domain. However, since spatial shifts produce ambiguous effects in the wavelet domain, one must conclude that new methods are required for motion estimation and compensation in the wavelet transform domain.
As a conclusion it can be stated that the motion estimation and compensation methods are based on subtracting the motion compensated video frame from the reference video frame for creating the error video frame and that the motion vector estimation is based on differences between pixel values.
A method and system for video compression, compatible with and exploiting the characteristics of a state-of-the-art image transformation in the compression, is presented. In the method and the system a plurality of error norms are exploited, the error norms being intrinsically related to the characteristics of the state-of-the-art image transformation.
The invention is illustrated for video compression techniques based on a translational motion model, thus exploiting motion estimation and compensation, but is not limited hereto.
The invention is further illustrated with the wavelet transformation as image transformation but is not limited hereto.
In the first aspect of the invention, the determination of the motion vector of a block of a video frame under consideration with respect to a reference video frame is determined by exploiting a plurality of sets of error norms. The determination of the error norms within one set is done by calculating the norm of an error which is given by a function, characteristic for the set, of the pixel values of the block of the video frame and pixel values of the reference video frame but for different positions of the block with respect to the reference video frame. Each set corresponds to a different function.
In a first embodiment of this aspect of the invention, the norms are calculated for weighted sums of pixel values, the weighted sums are characterized by a weighting vector. The norms of different sets correspond to different weighting vectors.
In a second embodiment of this aspect of the invention, one set of error norms is based on summing pixel values and another on subtracting pixel values of the block under consideration and the reference video frame. Both sets are exploited in the determination of the motion vector.
In a third embodiment of this aspect of the invention, the motion vector of a block of a video frame with respect to a reference video frame is determined by exploiting two sets of error norms, the first is based on the sum of absolute differences of pixel values and the second on the sum of absolute sums of pixel values.
In a fourth embodiment of this aspect of the invention, the motion vector of a block of a video frame with respect to a reference video frame is determined by exploiting two sets of error norms, the first is based on the sum of squared differences of pixel values and the second on the sum of squared sums of pixel values.
In a fifth embodiment of the first aspect of the invention the video frame and the reference video frame contain wavelet transformed subimages. The prediction error of the detail subimages can be reduced if one considers both summing and subtracting the original and the predicted blocks [Van der Auwera G., Munteanu A., Lafruit G., Cornelis J. xe2x80x9cVideo Coding Based on Motion Estimation in the Wavelet Detail Imagesxe2x80x9d. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2801-2804, Seattle, May 1998.], [Van der Auwera G., Munteanu A., Lafruit G., Cornelis J. xe2x80x9cA New Technique for Motion Estimation and Compensation of the Wavelet Detail Imagesxe2x80x9d. Eusipco, Rhodos, September 1998.].
In a second aspect of the invention a system is presented for encoding a sequence of video frames, exploiting temporal redundancy, via motion estimation and compensation techniques, using a plurality of sets of error norms in the motion estimation and a plurality of operations for error block determination that are compatible with each set of error norms.
In a first embodiment of the second aspect of the invention both summing and taking differences of pixel values are exploited for determination of the motion vector. Further summing of pixel values or taking differences of pixel values are exploited in determination of the error block. The system comprises of dedicated circuitry for performing image transformation, motion estimation, motion compensation and construction of the error video frame, the motion estimation, exploiting both summing and taking differences of pixel values, the construction of the error video frame also exploiting either summing and taking differences of pixel values.
In a further embodiment the image transformation is a J-level wavelet transformation.
In a further embodiment the summing and taking differences of pixel values are performed by separate circuits.
In a further embodiment the system comprises of a frame encoding circuit (FIG. 11, (30)) for encoding the error video frame, and a frame decoding circuit (FIG. 1) for decoding the error video frame. Furthermore, extra encoding can be provided at the output of the interframe encoding loop (FIG. 11, (140)).
In a second embodiment of the second aspect of the invention the system is adapted for performing image transformation, motion estimation, motion compensation and construction of the error video frame, by using the motion estimation, exploiting both summing and taking differences of pixel values, the construction of the error video frame also exploiting either summing and taking differences of pixel values. The system can be either a general purpose processor or a dedicated circuit or a combination of both.
In a third aspect of the invention a system is presented for decoding a sequence of video frames, being encoded by exploiting temporal redundancy, via motion estimation and compensation techniques, using a plurality of sets of error norms in the motion estimation and a plurality of operations for error block determination that are compatible with each set of error norms. The system inputs or loads an error block and performs a decoding operation on the error block. The system also performs a motion compensation of a block of a reference video frame. Note that the reference video frame can be a stored image, being an image transmitted earlier, or just a previous received image. Said motion compensation is based on an inputted motion vector. The motion vector is determined by one of a plurality of sets of error norms, each of the sets being related to a substantially different function of pixel values. Based on the motion compensated block of the reference video frame and the decoded error block, a block of a video frame is determined with operations being compatible with the function of pixel values used for determining the motion vector.
As an example, when the motion vector is determined by using a sum of absolute differences, then the error block must be summed with the motion compensated block. When the motion vector is determined by using a sum of absolute sums, then the error block and the motion compensated block must be subtracked.
In an aspect of the invention it is recognized that one needs at the decoding peer information on how the motion vector has been determined, thus which functions has been used for calculating the minimal error. Therefore an identifier is introduced which is used in the decoding methods and decoding system, for selecting the appropriate operations for reconstruction of the block. In an aspect of the invention it is recognized that besides the traditional stream of information, such as an encoded error block and motion vector, further the extra information, embedded in the identifier must be transmitted.
One embodiment of the invention presents a method and system for video compression, compatible with and exploiting the characteristics of state-of-the-art image transformations.