In the information age, the exchange of a variety of forms of multimedia data is commonplace and ever increasing. The exchanged data may, for example, comprise images, audio, or time series data representing scientific or business related information.
Various schemes of encoding data are known. A key purpose of encoding data is to ‘compress’ the data, i.e., to reduce the byte size of the data. This is desirable for reasons including the reduction of memory space required to store the data, and reduction of the time required to transmit the data through a communication channel having a certain finite bandwidth. The byte size can be expressed as bits per sample, or as is conventional in the case of image data, as bits per pixel.
Two classes of encoding methods are lossless and lossy. The former, more conservative approach endeavors to preserve every detail of the input data in the encoded form. Ideally the decoded version would be an indistinguishable replica of the input data.
In the case of lossy data encoding (compression), the level to which the detail of the image is preserved can be selected and there will be a tradeoff between the level of detail preserved and the byte size of the resulting encoded data. Often in using lossy data encoding, one strives to obtain a level of detail preservation such that the differences between a decoded version and the original are imperceptible. Judgments about the design and configuration of the lossy encoder to achieve imperceptible differences will be made in consideration of human perception models (e.g. hearing, or visual). A good lossy encoder and corresponding decoder will yield a decoded data set which may be distinguished from the original data set by rigorous scientific analysis but is indistinguishable to a human observer when presented in an intended format (e.g., screen resolution in the case of images).
One class of data encoding methods (applicable to image data) is transform coding. Generally speaking, in transform coding an ordered data set is projected onto an orthogonal set of basis functions to obtain a set of transformed data coefficients (inner products). The traditional type of transform coding derives from Fourier analysis. In Fourier based techniques, a data set is projected onto a set of functions derived from sinusoidal functions. The old JPEG standard (ISO/IEC 10928-1) is an example of a transform encoding method based on Fourier analysis. The old JPEG standard specifies a set of transform matrices which are discrete representations of products of a cosine function with a horizontal coordinate dependant argument and a cosine function with a vertical coordinate dependant argument. These basis functions are applied to analyze 8 by 8 pixel blocks of an input image.
A shortcoming of the Fourier based techniques, which prompted workers in the data compression art to take up other techniques as will be described shortly, is the fact that the sinusoidal function repeat indefinitely out to plus and minus infinity, whereas data sets which are encoded are localized in the time (or spatial) domain and have features which are further localized within the data set. Given the unbounded domain of Fourier bases functions and the periodic nature of data sets to be encoded over long intervals (or spans) one is led to segment the signal (e.g., into the aforementioned 8 by 8 blocks) in order to obtain a more efficient encoding. Unfortunately, this leads to abrupt jumps in the decoded version of the signal at edges between the segments. In the image processing art this is known as blocking effects.
Newer classes of transform methods employ basis functions which are inherently localized in the spatial domain. Mathematically speaking they are compactly supported. One example of the newer type of transform method is the wavelet based techniques. Wavelet based techniques employ a set of basis functions comprising a mother wavelet and a set of child wavelets derived from the mother wavelet by applying different time (or spatial) domain shifts and dilations to the mother wavelet. A wavelet basis set comprising a set of functions with localized features at different characteristic scales, is better suited to encode data sets such as image or audio data sets which have fine, course and intermediate features at different locations (times).
Multilevel wavelet decomposition is an iterative process, namely multi-resolutional decomposition. At each iteration a lower frequency set of transformed data coefficients generated by a prior iteration is again refined to produce a substitute set of transformed data coefficients including a lower spatial frequency group and a higher spatial frequency group, called subbands.
Returning to the matter of lossy encoding, whether it be Fourier, wavelet or otherwise based, the manner in which the reduction in the byte size (with the associated loss of detail) is achieved, according to the common prior art approach, is by quantizing the transformed data coefficients. Depending on the quantization and coding methods used, this step can be a computationally intensive operation which may account for up to 70% of the computational expense of a wavelet based encoder. Quantizing involves adjusting downward the resolution with which the value of the transformed data coefficients are recorded, so that they can be recorded using fewer bits. In the case of image data, transformed data coefficients associated with basis function that depend on finer details in the data will be quantized with less resolution. The decision to do this is based on current understanding of human visual perception.
In connection with perception as it relates to lossy image encoding it bears mentioning that there are quantitative measures such as SNR which aim at characterizing the fidelity of an encoded and decoded image to the original, however these measures are not always in accord with human assessments of the quality.
All of the quantizing values can be adjusted to yield lower image fidelity and correspondingly reduced byte size. Quantizing values that are used for an entire image can be varied, or different sets of quantizing values that are used for separate blocks of an image can be adjusted.
The new JPEG 2000 standard suggests providing “progressive transmission” whereby an encoded image can be transmitted at various resolutions. This is intended to afford adaptation to the display of the client receiving the image. According to the standard, this will be accomplished by a certain arrangement of the bit stream.
The predominant application for transform encoding is for storing data on, and sending data over the Internet. Extension of the Internet to wireless physical networks is currently in the early stages. Wireless networks generally have less bandwidth than fiber optic or copper Internet networks, therefore slower web browsing and long downloads can be expected for the same byte size transform encoded data (e.g., images).
Portable wireless devices generally have a low processing power and are therefore ill suited to perform computationally intensive operations such as quantizing and complex entropy coding.
Wireless networks are subject to connection instability. A connection may fail while an image is being transmitted and decoded. Prior art decoding schemes paint images in raster scan pattern, either in one pass or multiple passes, yielding successively better resolution. In the former case, if the connection is broken, part of the area of the image will be missing. In the latter case, the image would be left with an unsightly boundary between an upper high-quality area and a lower poor-quality area.
What is needed is a system and method for encoding and decoding data which will achieve the highest possible level of fidelity of the decoded data with respect to the original data for a given allocation of bits per sample in the encoded representation with minimal computational costs.