The purpose of source coding (or compression) is data rate reduction. For example, the data rate of an uncompressed NTSC (National Television Systems Committee) TV-resolution video stream is close to 170 Mbps, which corresponds to less than 30 seconds of recording time on a regular compact disk (CD). The choice of a compression standard depends primarily on the available transmission or storage capacity as well as the features required by the application. The most often cited video standards are H.263, H.261, MPEG-1 and MPEG-2 (Moving Picture Experts Group). The aforementioned video compression standards are based on the techniques of discrete cosine transform (DCT) and motion prediction, even though each standard targets a different application (i.e., different encoding rates and qualities). The applications range from desktop video-conferencing to TV channel broadcasts over satellite, cable, and other broadcast channels. The former typically uses H.261 or H.263 while MPEG-2 is the most appropriate compression standard for the video broadcast applications.
Motion prediction operates to efficiently reduce the temporal redundancy inherent to most video signals. The resulting predictive structure of the signal, however, makes it vulnerable to data loss when delivered over an error-prone network. Indeed, when data loss occurs in a reference picture, the lost video areas will affect the predicted video areas in subsequent frame(s), in an effect known as temporal propagation.
Tri-dimensional (3-D) transforms offer an alternative to motion prediction. In this case, temporal redundancy is reduced the way spatial redundancy is; that is, using a mathematical transform for the third dimension (e.g., wavelets, DCT). Algorithms based on 3-D transforms have proven to be as efficient as coding standards such as MPEG-2, and comparable in coding efficiency to H.263. In addition, error resilience is improved since compressed 3-D blocks are self-decodable.
Non-orthogonal transforms present several properties that provide an interesting alternative to orthogonal transforms like DCT or wavelet. Decomposing a signal over a redundant dictionary improves the compression efficiency, especially at low bit rates where most of the signal energy is captured by few elements. Moreover, video signals issued from decomposition over a redundant dictionary are more resistant to data loss. The main limitation of non-orthogonal transforms is encoding complexity.
Matching pursuit algorithms provide a way to iteratively decompose a signal into its most important features with limited complexity. The matching pursuit algorithm will output a stream composed of both atom parameters and their respective coefficients. The problem with the state-of-the-art in matching pursuit is that the dictionaries do not address the need for decomposition along both the spatial and temporal domains, and also the optimization of source coding quality versus decoding complexity for a given bit rate.
The art in Matching Pursuit (MP) coding is limited. A publication by S. G. Mallat and Z. Zhang, entitled “Matching Pursuits With Time-Frequency Dictionaries”, Transactions on Signal Processing, Vol. 41, No. 12, December 1993 details one application of matching pursuit coding. In addition, the publication entitled “Very Low Bit-Rate Video Coding Based on Matching Pursuits”, by R. Neff and A. Zakhor, Circuits and Systems for Video Technology, Vol. 7, No. 1, February 1997, the publication entitled “Decoder Complexity and Performance Comparison of Matching Pursuit and DCT-Based MPEG-4 Video Codecs”, by R. Neff, T. Nomura and A. Zakhor, Circuits and Systems for Video Technology, Vol. 7, No. 1, February 1997, and U.S. Pat. No. 5,699,121, detail using a 2-D matching pursuit coder to compress the residual prediction error resulting from motion prediction.
The shortcomings of the prior art include, first, that matching pursuit has never been proposed for coding 3-D signals. Second, the basic functions have been limited to Gabor functions because they were proven to minimize the uncertainty principle. However these functions are generally isotropic (same scale along x- and y-axes) and do not address image characteristics such as contours and textures.
What is needed, therefore, is a system and method to represent a video signal for improved source quality versus decoding complexity for a given compression rate and improved resistance to data loss when delivered over an error prone network.