The amount of images produced worldwide is in constant growth. Photographic and video data, by its nature, consumes a large part of available digital resources, such as storage space and network bandwidth. Image compression technology plays a critical role by reducing storage and bandwidth requirements. There is therefore a need for high compression ratios whilst ensuring a quantifiably negligible loss of information.
Imaging is the technique of measuring and recording the amount of light L (radiance) emitted by each point of an object, and captured by the camera. These points are usually laid-out in a two-dimensional grid and called pixels. The imaging device records a measured digital value dadc(x, y) that represents L(x, y), where (x, y) are the coordinates of the pixel. The ensemble of values dadc(x, y) forms a raw digital imageM. Typically M consists of several tens of millions pixels and each dadc(x, y) is coded over 8 to 16 bits. This results in each raw image requiring hundreds of megabits to be represented, transferred and stored. The large size of these raw images imposes several practical drawbacks: within the imaging device (photo or video camera) the size limits the image transfer speed between sensor and processor, and between processor and memory. This limits the maximum frame rate at which images can be taken, or imposes the use of faster, more expensive and more power consuming communication channels. The same argument applies when images are copied from the camera to external memory, or transmitted over a communication channel. For example:                transmitting raw “4K” 60 frames per second video in real time requires a bandwidth of 9 Gbps; and storing one hour of such video requires 32 Tbits.        transmitting a single 40 MPixel photograph from a cube satellite over a 1 Mbps link takes 10 minutes.        backing up a 64 GByte memory from a photographic camera over a 100 Mbps Internet connection takes more than one hour.To alleviate requirements on communication time and storage space, compression techniques are required. Two main classes of image compression exist:        A. Lossless image compression, where digital image data can be restored exactly, and there is zero loss in quality. This comes at the expense of very low compression ratios of typically 1:2 or less.        B. Lossy image compression, where the compressed data cannot be restored to the original values because some information is lost/excised during the compression, but which can have large compression ratios of 1:10 and higher. These compression ratios however come at the expense of loss of image quality, i.e. distortion, and the creation of image compression artifacts, i.e. image features that do not represent the original scene, but that are due to the compression algorithm itself.        
Lossy image compression methods, such as those in the JPEG standard or used in wavelet compression, create distortion and artifacts. Distortion is the degree by which the decoded image Q, i.e. the result of compressing and decompressing the original image, differs from the original image M, and is usually measured as the root-mean-square of the differences in the values of each corresponding pixel between Q and M. Lossy compression typically also introduces artefacts, which are a particularly bad type of distortion that introduces image features in Q, i.e. correlations between different pixels, that are not present in M. Examples of artefacts are the “block” artifacts generated by block image compression algorithms like JPEG, but also ringing, contouring, posterizing. These artifacts are particularly nefarious, as they can be mistaken for image features. Lossy image compression algorithms, as described for example in prior art documents [1] and [2] cited below, typically consist of several steps. First, the image is transformed into a representation where the correlation between adjacent data point values is reduced. This transformation is reversible, so that no information is lost. For example, this step would be similar to calculating the Fourier coefficients, i.e. changing an image that is naturally encoded into a position space, into a special frequency space. The second step is referred to as quantization. This step truncates the value of the calculated coefficients to reduce the amount of data required to encode the image (or block, consisting of e.g. 16×16 pixels). This second step is irreversible, and will introduce quantization errors when reconstructing the image from the aforementioned coefficients. The quantization error will cause errors in the reconstructed image or block. Predicting or simulating what the error would be for any image or block is technically impossible, due to the extremely large number of values that such image or block may take. For example, a 16×16 block, with 16-bits per pixel, may take 216×16×16≅104932 different values, impossible to test in practice. If the transformation is not restricted to blocks, but rather is a function of the entire image, the number of possible values becomes considerably larger.
More particularly, in standard lossy image compression techniques (e.g. document [1]), a lossless “image transformer” is first applied to the image, resulting in transformed image, where each data-point is a function of many input pixels and represents the value of a point in a space that is not the natural (x,y) position space of the image, but rather is a transformed space, for example a data-point might represent a Fourier component. A lossy “image quantizer” is applied on the transformed image as a second step. The amount of information loss cannot be accurately quantified in standard lossy image compression techniques for the above mentioned reason of the extremely large number of values that such image or block may take and is additionally complicated by the fact that the lossy operation is applied on a space that is not the natural image space.
Attempts have been made at characterizing the quality of the output of image compression algorithms. These efforts have typically focused on characterizing the quality of the reconstructed compressed image with respect to the human visual system, as reviewed in documents [2, 3, 4, 5] described below. Several metrics have been devised to characterize the performance of the compression algorithm, however, these metrics have significant drawbacks. One such drawback is that as they relate quality to the human visual system and human perception, which are highly dependent on the subject viewing the image, on the viewing conditions (e.g. eye-to-image distance, lighting of the image, environment lighting conditions, attention, angle, image size) as well as on the image rendering algorithms (e.g. debayering, gamma correction) and characteristics of the output device, such as display, projector screen, printer, paper etc. A second drawback of characterizing the quality of image compression algorithms with respect to the human visual systems or models thereof is that for such methods to be relevant, no further image processing must take place after image compression, as such processing would make some unwanted compression artifacts visible. For example, a compression algorithm might simplify dark areas of an image, removing detail, judging that such detail would not be visible by the human eye. If it is later decided to lighten the image, this detail will have been lost, resulting in visible artifacts. Yet another problem of the above-mentioned methods is that they are unsuitable for applications not aimed solely at image reproduction for a human observer, as for example are images for scientific, engineering, industrial, astronomical, medical, geographical, satellite, legal and computer vision applications, amongst others. In these applications data is processed in a different way as by the human visual system, and a feature invisible to the untrained human eye and therefore removed by the above-mentioned image compression methods, could be of high importance to the specific image processing system. For example, inappropriate image compression of satellite imagery could result in the addition or removal of geographical features, such as roads or buildings.
More recently, attempts at a more quantitative approach on the information loss in digital image compression and reconstruction have been made [6], in particular trying to examine information loss given by the most common image compression algorithm: Discrete Wavelet Transform (DWT), Discrete Fourier Transform (DFT) and 2D Principal Component Analysis (PCA). Several “quantitative” metrics are measured on several images compressed with several algorithms, for example quantitative measures of “contrast”, “correlation”, “dissimilarity”, “homogeneity”, discrete entropy, mutual information or peak signal-to-noise ratio (PSNR). However, the effect of different compression algorithms and parameters on these metrics highly depends on the input image, so that it is not possible to specify the performance of an algorithm, or even to choose the appropriate algorithm or parameters, as applying them on different images gives conflicting results. A further drawback of the methods described in [6], is that although these methods are quantitative in the sense that they output a value that might be correlated with image quality, it is unclear how this number can be used: these numbers cannot be used as uncertainties on the value, the compression methods do not guarantee the absence of artifacts, and it is even unclear how these numbers can be compared across different compression methods and input images. Also, these quantitative methods do not distinguish signal from noise, so that for example, a high value of entropy could be given by a large amount of retained signal (useful) or retained noise (useless).
The impossibility for the above-mentioned methods to achieve compression with quantifiable information loss arises from the fact that the space in which the algorithm looses data is very large (e.g. number of pixels in the image or block times the number of bits per pixel), and cannot therefore be characterized completely, as we have mentioned earlier.
Image-processing systems that act independently on individual (or small number of) pixels have been known and used for a long time. These systems typically provide a “look-up table” of values so that each possible input value is associated with an “encoded” output value. The main purpose of such processing has been to adapt the response curves of input and output devices, such as cameras and displays respectively. The functions used to determine the look-up tables have typically been logarithmic or gamma law, to better reflect the response curve of the human visual system as well as the large number of devices designed to satisfy the human eye, starting from legacy systems like television phosphors and photographic film, all the way to modern cameras, displays and printers that make use of such encodings. Although not their primary purpose, these encodings do provide an insignificant level of data compression, such as for example encoding a 12-bit raw sensor value over 10-bits, thus providing a compression ratio of 1.2:1, as described, for example in [7].