This invention relates generally to the field of computer data recovery transformations, and in particular to recovery of signal data such as image data, wherein portions of the data have been lost.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1998-1999, Microsoft Corporation. All Rights Reserved.
Channels used to transmit data are sometimes xe2x80x9cnoisyxe2x80x9d and can corrupt or lose portions of the data transmitted. A channel is said to xe2x80x9ccorruptxe2x80x9d portions of the data if it changes some data values during transmission without notice to the receiver. A channel is said to xe2x80x9closexe2x80x9d or xe2x80x9cerasexe2x80x9d portions of the data if it does not transmit some data values to the receiver, but notifies the receiver as to which data items were not transmitted. Accordingly, channels may be classified as xe2x80x9ccorruptionxe2x80x9d or xe2x80x9cerasurexe2x80x9d channels. An example of a corruption channel is a wireless communications system wherein data are occasionally corrupted due to electromagnetic noise in the atmosphere. Another example of a corruption channel is a storage system such as a magnetic disk, a Compact Disk Read-Only Memory (xe2x80x9cCDROMxe2x80x9d), or a Digital Versatile Disk (xe2x80x9cDVDxe2x80x9d), wherein data are occasionally corrupted due to scratches and dust on the recording media or due to writing or reading errors in the channels which move data onto or off from the media. An example of an erasure channel is a packet communications network wherein certain packets are lost in transmission, and wherein the receiver can detect which data are lost using packet sequence numbers and/or other means. Another example of an erasure channel is a storage system wherein the data are striped across a number of disks, any of which can fail at random, and whose failure can be detected, such that which data items are missing is known. Any corruption channel can appear, at a high level, to be an erasure channel, by appropriate low-level error detection processing. For example, a parity check or checksum word can be transmitted along with a block of data; if the received checksum does not agree with the received data, then an error is detected in the block of data, and the block of data can be considered erased by the channel. It is desirable to have ways to recover the data that is corrupted or lost. For example, image data that have either pixels or blocks that have been lost would have blank (perhaps white or black on a display monitor) or would appear to have xe2x80x9cnoisexe2x80x9d or snow in the received images. A xe2x80x9cpixelxe2x80x9d is an individual picture element, and is generally represented by a single value indicating, for example, the intensity of image at a point, or the intensity of each of three colors. A xe2x80x9cblockxe2x80x9d is a group of adjacent pixels, generally a rectangle. For example, a block could be a 16-by-16-pixel portion of a larger image.
A xe2x80x9cchannelxe2x80x9d can be a transmission channel, such as digital television broadcast or cable television transmission, or a telephony packet network that moves data over a distance or space. A xe2x80x9cchannelxe2x80x9d can also be a storage channel that stores the data on a storage medium and then reads the data at a later time.
Error-correction codes have been devised for both corruption and erasure channels, and are well known. (See S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, Prentice-Hall, 1983, herein Lin and Costello.) By far, the most common error-correction codes are based on linear transformations of the data to add redundancy. If x is a data vector with K elements, and F is an Nxc3x97K matrix, with Nxe2x89xa7K, then y=Fx is a vector with N elements. Nxe2x88x92K of these elements can be considered redundant information. By transmitting the code vector y instead of x, the receiver can recover x even if some elements of y are corrupted or lost. In particular, if y is transmitted through an erasure channel, and no more than Nxe2x88x92K elements of y are erased, then x can be recovered perfectly. For example, if x is a vector containing 3 binary elements, and       F    =          [                                    1                                0                                0                                                0                                1                                0                                                0                                0                                1                                                1                                1                                1                              ]        ,
then y is a vector containing 4 binary elements, the first three of which are equal to x (this is called the xe2x80x9csystematicxe2x80x9d part of y) and the last of which is equal to the sum of the bits in x, modulo 2. That is, the last bit of y is equal to the exclusive-OR, or parity, of x. If any three bits of y are recovered, then x can be recovered perfectly. On the other hand, if two or more bits of y are erased, then x cannot be recovered perfectly; the bits of x that are lost are unrecoverable in this case.
Error-correction codes are intended to recover symbolic data. In the example above, the elements of x are binary symbols. In general, they can be members of a finite field, e.g., they can be M-ary symbols. There is no notion of distance between members of a finite field, except the so-called discrete topology: two elements may be equal or not equal. There is no notion of approximation. Hence in error-correction coding, there is no notion of approximate recovery. An element of x is either recoverable, or it is not.
One type of data for which approximation is useful is signal data, such as image, video, or audio data. Signal data, in contrast to symbol data, has elements that are members of an infinite field, such as the rational numbers, the real numbers, or the complex numbers. In a digital computer, these elements must be represented by finite precision numbers, such as 16-bit signed integers or 32-bit floating-point numbers. However there is a natural notion of distance between two such elements, e.g., the absolute value of their difference. As a consequence, any data vector x containing signal data (such as an image or a frame of audio) can be approximated by another data vector {circumflex over (x)}, where the degree of approximation can be measured by the Euclidean distance, a perceptual metric, or some other measure.
It is possible to apply error-correction codes to the transmission of signal data over erasure channels, by treating each signal element as a discrete M-ary symbol. However, this is inefficient. Signals need not be transmitted exactly, provided the approximation error between the signal and its reconstruction is small, on average. Applying a high-redundancy error-correction code can ensure this, but such a solution is expensive in terms of transmission capacity. An alternative is to use xe2x80x9cunequal error protection,xe2x80x9d which is a technique in which only the most important parts of the signal are protected with high redundancy. The less important parts are protected with low redundancy or not at all. One way to implement unequal error protection is the following. If the signal vector x contains elements that are 16-bit integers, then x can be stratified into 16 xe2x80x9cbit-planesxe2x80x9d. Then each bit-plane can be protected using a binary error-correction code with a redundancy commensurate with the importance of the bit-plane in approximating the signal vector x. (See for example J. Hagenauer, xe2x80x9cRate-Compatible Punctured Convolutional Codes (RCPC Codes) and their Applications,xe2x80x9d IEEE Trans. on Communications, April 1988; A. Albanese, J. Blxc3x6mer, J. Edmonds, M. Luby, and M. Sudan, xe2x80x9cPriority Encoding Transmission,xe2x80x9d IEEE Trans. Information Theory, November 1996; G. Davis and J. Danskin, xe2x80x9cJoint Source and Channel Coding for Image Transmission Over Lossy Packet Networks,xe2x80x9d SPIE Conf. on Wavelet Applic. of Digital Image Processing, August 1996.)
Even with the technique of unequal error protection, error-correction codes may not be ideally suited to protection of signal data. Ideally, all of the received bits should be used to improve the approximation of the reconstruction. However, in error-correction codes for erasure channels, if more than a critical number of symbols are received, then the additional symbols are ignored, while if fewer than the critical number of symbols are received, then any symbols not in the systematic portion of the code are ignored. In other words, if an error-correction code can correct a symbol exactly then it does not try any harder, while if it cannot correct a symbol exactly then it gives up. Although this may be ideal behavior for symbolic data, it is not ideal behavior for signal data. For signal data, the error control mechanism should try to approximate the transmitted signal with whatever information it has at its disposal.
Signal data is often voluminous. For example, a 512-by-512-pixel image has 262,144 pixel values. Therefore it is important that signal-processing techniques be computationally efficient.
Thus, there is a need for a method and apparatus that can efficiently reconstruct, to a suitable quality of approximation, an image or other set of signal data that has been transmitted on an erasure channel, or for some other reason has missing values.
The present invention provides a POCS- (Projections Onto Convex Sets)- based algorithm for consistent reconstruction from multiple descriptions of overcomplete expansions. In particular, image data can benefit from the reconstructions provided by the present invention, even when too little data is available for a perfect recreation of the original image. Similarly, audio information is another type of data that can benefit. The algorithm operates in the data space RK rather than in the expanded space RN, N greater than K. By constructing the frame from two (or more) complete transform bases, all projections can be expressed in terms of forward or inverse transforms. Since such transforms are usually efficient to compute, the present invention can perform the reconstruction much faster than with previous methods. Indeed, the method of one embodiment provides overcomplete frame expansions of an entire image and reconstruction of the image after transmission through a channel that loses some of the coefficients.
One aspect of the present invention provides a POCS-based algorithm for consistent reconstruction of a signal xxcex5RK from any subset of quantized coefficients yxcex5RN in an Nxc3x97K overcomplete frame expansion y=Fx, N=2K. By choosing the frame operator F to be the concatenation of two Kxc3x97K invertible transforms, the projections may be computed in RK using only the transforms and their inverses, rather than in the larger space RN using the pseudo-inverse as proposed in earlier work. This enables practical reconstructions from overcomplete frame expansions based on wavelet, subband, or lapped transforms (one or more of which are used in some of the various embodiments of the present invention) of an entire image, which has heretofore not been possible.
In some embodiments, a set of data x is provided. The set of data x is overcompletely transformed to a set of coefficients y having N dimensions, where N greater than K. The resulting coefficients of y are quantized to obtain a set of indices j such that each coefficient yi lies in the jith quantization region and each one of the indices j correspond to a respective quantization region. For example, the range of the coefficients of y might be xe2x88x92578.4 to +931.5. This range is divided into 1024 quantization regions, each assigned an index j, where each j is a number between 0 and 1023 inclusive. In some embodiments, the quantization regions are not all the same size. Subsets of the indices j are then entropy coded to obtain binary descriptions (e.g., combined into packets). The binary descriptions are then transmitted on a channel.
In some embodiments, the channel includes a storage medium, and the set of data y are stored onto a storage medium and read from the storage medium.
Another aspect of the present invention provides an apparatus and a corresponding computerized method of reconstructing a set of data x having K dimensions. The method includes receiving a set of data yxe2x80x2, wherein the set of data yxe2x80x2 corresponds to a set of data y but with some coefficients missing, wherein the set of data y has N dimensions and represents an overcomplete transformation of the set of data x, and wherein N greater than K and the set of data yxe2x80x2 has xe2x89xa6N dimensions. The method also includes iteratively transforming the received set of data yxe2x80x2 using projections onto convex sets as calculated with a computer to produce a set of data xxe2x80x2 representing the reconstructed set of data x.
In one embodiment, the method further includes providing an initial set of data x having K dimensions, overcompletely transforming the set of data x to a set of data y having N dimensions, where N greater than K; and transmitting the set of data y on a channel.
In another embodiment of the method, the transmitting on the channel further includes storing the set of data y onto a storage medium; and reading the set of data y from the medium.