1. Field of the Invention
The present invention relates generally to a method and device to code images for data transmission, and more specifically to a method and device to determine the optimal transform coefficients for an irregular shaped image for low bit-rate transmission using standard transforms.
2. Information Disclosure Statement
Although current video coding standards may operate at very low bitrates, the trade-off between temporal and spatial resolution results in visually annoying motion or spatial artifacts. Therefore, the International Organization for Standardization is considering developing a new standard for very low bitrate A/V coding. ISO/IEC JTC1/SC29/WG11 MPEG 92/699, "Project Description for Very-Low Bitrate A/V Coding" (Nov. 5, 1992). This document reviews the state of the art and proposes a direction for future research.
In typical image coding systems, the image to be coded is usually processed using N.times.N blocks of picture elements (pels) regardless of the image content. This approach, however, may lead to visible distortions known as blocking and mosquito effects, particularly at low bit-rates. To avoid these visual artifacts, region-based image representation partitions the image into regions of similar motion or texture, yielding image segments of arbitrary shape instead of fixed (rectangular) blocks. Such image representation offers several advantages over the conventional block-based representation such as adaptation to local image characteristics. Consequently, region-based image representation has received considerable attention in MPEG4 video coding standard work for very low bitrate coding.
A fundamental issue in region-based image compression is the coding of arbitrarily shaped image segments. An arbitrarily shaped image segment f(x,y) can be approximated by a set of basis functions optimized for the shape of the image segment to be coded: ##EQU1## where x,y S, S is the region occupied by the image segment, f(x,y) is the approximation of the image segment, and .phi..sub.i 's are the basis functions. However, such shape-adapted transform techniques require a large amount of memory for storing the set of basis functions. As a result, these techniques are only suitable for small regions. Furthermore, for each new segment a new set of basis functions has to be computed. Thus, extensive computation is involved. Since no fast algorithms exist, these techniques are not attractive for practical use.
Another popular approach is to use one of the most popular image compression techniques, transform coding. In transform coding, an image is transformed from the image intensity domain to a new domain prior to coding and transmission. The new domain is selected so that the energy of the image becomes concentrated to a small region in the new domain. Among the various transforms, the discrete cosine transform (DCT) is the most widely used transform. It has become the industry standard because it provides a good approximation of the optimal Karhunen-Loeve transform (KLT) for a certain class of images, and can be computed by means of fast algorithms.
With block transform coding, the image segment can be approximated by a set of two-dimensional basis functions defined on a rectangular block "B" which circumscribes the image: ##EQU2## where x,y S, and .psi..sub.i 's are the basis functions defined on the full block B. The best approximation f(x,y) of an image segment can be found by minimizing the squared error between the image segment and the approximation, i.e., EQU error=.SIGMA.(f(x,y)-f(x,y)).multidot..sup.2 ( 3)
This is equivalent to solving the Gaussian normal equations. Note that the summation is taken over the region defined by the image segment; pels outside the region are discarded. Since the number of pels of the image segment is usually less than the number of basis functions, the problem is undetermined, and several solutions are possible. To arrive at a single solution, the problem can be solved by successive approximation. This involves starting with a small subset of basis functions and exhaustively searching for the best solution. Although successive progression will yield a solution, the computational cost is high. Furthermore, like the shape-adapted techniques, no fast algorithms are available to make real-time implementation possible.
A more efficient approach is to perform the transform on the entire block, ##EQU3## where x, y B, and B is the area of the block. The transform can be performed in real-time by special purpose chips designed for block transforms. However, this technique requires that the pels outside the image segment be initialized before the transform occurs. The outside pels can be chosen such that the sum of squared errors over the image segment expresses by Equation (3) is minimized. This approach enables the transform spectrum to be optimized by choosing appropriate pel values outside the image segment. To this end, zeroing the outside pels would be an easy way to initialize them. This approach, however, introduces discontinuities at the boundary of the image segment, yielding high frequency components that degrade the coding performance. To alleviate the problem, the image segments can be extrapolated outside the boundary by mirroring or pel repetition such that a smoother transformation can be obtained. This ad hoc approach though, fails to provide consistent, satisfactory results. Consequently, a more promising method is needed. The present invention fulfills this need.
The present invention utilizes the theory of successive projection onto convex sets (POCS). In Patrick L. Combettes, "The Foundation of Set Theoretic Estimation," Proceedings of the IEEE, Vol. 81, No. 2 (Feb. 1993), this theory is described in a theoretical sense. The present invention applies this theory in a practical sense to image coding.