1. Field of the Invention
The present invention relates in general to communications. More particularly, the invention relates to the transmission of multidimensional signals, such as video signals.
2. Description of the Background Art
Motivation
The transmission of large amounts of data across a large decentralized network, such as the Internet, is an open problem, Motion picture data, i. e., video data, presents a particularly vexing problem, as the data tends to be particularly voluminous. Compression enables the representation of large amounts of data using fewer bits, thereby increasing storage capacity and reducing transmission times. Current techniques for video transmission include MPEG and its progeny, MPEG2 and MPEG4. MPEG-type compression schemes divide the original image frame into blocks or uniform size and shape, and transmit the motion, i.e., the change in location of blocks from one frame to another. This reduces the amount of data that needs to be transmitted and/or stored.
One relevant limitation of MPEG and other compression schemes is that as blocks or objects move, new regions within the image may be uncovered. FIGS. 1A and 1B illustrate a newly uncovered (exposed) image region and are used to illustrate a problem that serves as a motivation for the present invention. FIG. 1A illustrates an image frame composed of four regions or “objects,” marked 11 through 14 as illustrated. Each object may include multiple blocks under block-based compression schemes such as MPEG. (The objects in FIG. 1 are rectangular for purposes of simplicity of explanation. Actual objects need not be, and typically are not, of rectangular shape. In some schemes, the objects may be of arbitrary shape.) In FIG. 1B, objects 12 and 14 have moved apart horizontally, revealing image region 15, a previously occluded, now newly uncovered region. The color values of region 15 are wholly unknown to the decoder. MPEG and similar programs simply apply one of many still image compression techniques, such as DCT coding, to the newly uncovered regions and then transmits it to the receiving device. This conventional way of dealing with newly uncovered regions is rather inefficient.
Multiscale Transforms
Examples of multi-scale transforms are found in the field of image and video processing. There applications include spectral analysis, image de-noising, feature extraction, and, of course, image/video compression. JPEG2000, the Laplacian pyramid of Burt & Adelson [Burt and Adelson I], traditional convolution wavelet sub-band decomposition, and the lifting implementation of [Sweldens I] are all examples of multi-scale transforms. Many variations of multi-scale transforms differ in regards to how the transform coefficients are quantized and then encoded. Such variations include SPIHT by Said and Pearlman [SPIHT I], EZW (see [Shapiro I]), trellis coding (see [Marcellin I]), etc.
All multi-scale transforms operate on the principle that the efficient representation of a given multi-dimensional signal is characterized by looking at the data via a decomposition across different scales. Here a scale refers to a characteristic length scale or frequency. Coarse scales refer to smooth broad transitions in a function. The very fine scales denote the often sharp, local fluctuations that occur at or near the fundamental pixel scale of the signal.
FIG. 2A illustrates an example of different scale information for a given 1-D signal. Note that the function is actually well characterized as a smoothly varying coarse scale function f1(x) (see FIG. 2B) plus one other function depicted in FIG. 2C, f2(x). The function f2(x) contains the majority of the fine scale information. Note that f2(x) tends to oscillate or change on a very short spatial scale; whereas f1(x) changes slowly on a much longer spatial scale. The communications analogy is that of a carrier signal (i.e. coarse scale modulating signal) and the associated transmission band (i.e. high frequency or fine scale signal). In fact by referring to FIGS. 2A-C one can see that the complete high frequency details are well characterized by f2(x) and the low frequency or average properties of the signal are exhibited by f1(x). In fact few signals are as cleanly characterized into specific scales as the function depicted in FIG. 2A.
FIGS. 2D-G show a similar process in 2-dimensions (2-D). The original pixel data, or finest scale, is denoted in FIG. 2D. Here the averaging filter at each scale is depicted in FIG. 2E as well as an example sub-sampling rule. In this case the sub-sampling rule is referred to as a quincunx lattice in 2-dimensions and once again preserves half the points at each step. FIGS. 2F and G show successive steps in building the multi-resolution pyramid for a square domain via application of the filter and sub-sampling logic depicted in FIG. 2E. At each step of the process the numbers at each pixel refer to the functional value of the pyramid at a given scale. Note that the scale depicted in FIG. 2G contains almost one quarter of the sample points in the original 2-D function shown in FIG. 2D because each application of the quincunx sub-sampling reduces the number of points by a factor of two. Other samplings are also known in the art.
In order to handle boundary effects for the convolution at the edge of the pictured rectangular domain, it may be assumed, for example, that the data at each scale can be extended via a mirror symmetric extension appropriate to the dimensionality of the signal across the boundary in question.
Pyramidal Transform
FIG. 2H depicts a conventional forward pyramidal transform 200. The transform 200 typically operates on an image 202. The pyramidal transform 200 illustrated in FIG. 2H includes three levels (layers) of transformation.
In the first level of transformation, a low pass reduction 204 and a high pass reduction 206 are performed on the image 202. The low pass reduction 204 comprises filtering the color values of the image array through a low pass filter, then reducing by down sampling. For example, if the down sampling is by a factor of two, then every other pixel is effectively removed by the reduction. The result of such a low pass reduction is a coarser version of the image. If the down sampling is by a factor of two, then the low pass reduction 204 outputs a coarse subimage (not shown) with half the number of pixels as in the image 202. Similarly, the high pass reduction 206 comprises filtering the color values of the image array through a high pass filter, then reducing by down sampling. The result of such a high pass reduction is difference subimage 208. The difference subimage 208 also has a fraction of the number of pixels as in the image 202. A difference subimage may be called an error subimage. As described later, difference subimages may be recombined with coarse subimages to reconstruct the image.
The second and third levels are similar to the first level. In the second level, a low pass reduction 210 and a high pass reduction 212 are performed on the coarse subimage output by the first level's low pass reduction 204. In the third level, a low pass reduction 216 and a high pass reduction 218 are performed on the coarse subimage output by the second level's low pass reduction 210. The result of the pyramidal transform 200 is a final coarse subimage 222 output by the third level's low pass reduction and three difference subimages 208, 214, and 220 (one for each level).
FIG. 2I depicts a conventional reverse (or inverse) pyramidal transformation 250. The inverse transform 250 operates on the coarse subimage 222 output by the forward transform 200. Like the forward transform 200 in FIG. 2H, the reverse transform 250 in FIG. 2I includes three levels.
In the first level, an expansion low pass 252 is performed on the coarse subimage 222. The expansion low pass 252 comprises expanding by upsampling, then filtering through a low pass filter. For example, if the up sampling is by a factor of two, then a zero pixel is effectively inserted between every two pixels. Also in the first level, expansion high pass 254 is performed on the last (in this case, the third) difference subimage 220 from the forward transform 200. The expansion high pass 254 comprises expanding by upsampling, then filtering through a high pass filter. The outputs of the expansion low pass 252 and of the expansion high pass 254 are then added together. The result is a less coarse subimage (not shown). For example, if the upsampling is by a factor of two, then the less coarse subimage should have twice the number of pixels as the coarse subimage 222.
The second and third levels are similar to the first level. In the second level, an expansion low pass 256 is performed on the less coarse subimage output by the first level's expansion low pass 252. In addition, an expansion high pass 258 is performed on the second difference subimage 214 from the forward transform 200. The outputs of the expansion low pass 256 and of the expansion high pass 258 are then added together. The result is another less coarse subimage (not shown). In the third level, an expansion low pass 260 is performed on the less coarse subimage output by the second level's expansion low pass 256. In addition, an expansion high pass 262 is performed on the first difference subimage 208 from the forward transform 200. The outputs of the expansion low pass 260 and of the expansion high pass 262 are then added together. The result is a reconstruction of the image 202. Note that the conventional transform and inverse transform as described above is lossless in that the reconstructed image 202 in FIG. 2I is the same as the original image 202 in FIG. 2H.