Microsoft approached the SC29/WG1 ISO committee, better known as the “JPEG” committee, in Spring 2007 answering the open call for Advanced Image Coding and provided their HDPhoto compression scheme, now publicly known as JPEG XR, for standardization. Performance measurements were carried out by the committee. Similar tests have been run outside the JPEG as well. Various tests measured both objective performance, using MSE, SSIM, M-SSIM, VDP and subjective performance using ordering tests. The test results indicated consistently that while the Mean Square Error (MSE) of JPEG XR is close to JPEG2000 and appears competitive, the actual human perceived visual performance of the proposed encoder was not competitive, and even below or only close to traditional JPEG. Reasons for this are found in the codec design, which is PSNR optimal, but not visually optimal.
JPEG XR is very much like traditional JPEG. JPEG XR is a block-based design, and follows the traditional design principles found in many image compression schemes. That is, first, a linear decorrelation transformation removes redundancies in the input data, followed by a quantization procedure that removes irrelevant data. The symbols generated by the quantizer are then entropy-coded.
While JPEG uses a classical 8×8 DCT and JPEG2000 a discrete wavelet transformation, JPEG XR employs a 4×4 orthogonal overlapped block transformation. The 15 high-pass coefficients of a block form the so-called AC band, whereas the DC coefficients of 16 neighboring blocks are recursively transformed again with the same transformation, resulting in 15 lowpass coefficients forming the LP band, and one DC coefficient in the DC band. All three bands are then quantized with a scalar deadzone quantizer. Even though quantization bucket sizes can be tuned for each of the three bands, all coefficients within in each band share the same quantization parameter, regardless of the spatial frequencies they represent.
The quantization parameters are, however, adjustable on a block by block basis. The quantization symbols are then represented in binary notation of which the high-order (MSB) bits are entropy encoded, and the low-order bits, the so-called residual-bits, are almost directly represented as is in the codestream. In JPEG XR entropy coding of the (Most Significant Bits) MSBs of coefficients works similar to JPEG by scanning the coefficients in well-defined order where both run-length and amplitude of the non-zero coefficients are combined into a single symbol, which is then Huffman-coded. Unlike JPEG, however, the scan order is adapted dynamically by a move-to-front list, i.e. coefficients that are often found non-zero are advanced to the start of the scan. To this end, the encoder keeps two arrays, scan[ ] and total[ ], where the [ ] are used to represent a set of array elements. The first scan array, scan [ ], includes the scan order, and the second array total [ ], includes the total number of non-zero coefficients found at the given scan position. Whenever total[k+1]>total[k], the relative order of the k-th and k+1 coefficient are interchanged. Another difference is that the Huffman back-end adjusts itself dynamically by switching between several available code-books. The available code-books are, however, static and defined by the specifications. Other smaller details between JPEG and JPEG XR differ in how the run-length code is constructed, but these differences are not important to the present invention.
The performance of an image compression codec is often determined by measuring image quality, as defined by an image metric, and the data rate of the final output stream which in many cases is defined as the average number of bits spent per pixel, e.g., the average number of bits used to represent each individual pixel of a coded image. A superior code achieves a higher quality at given rate, or a smaller rate for a given quality threshold. Traditionally, the mean-square error (MSE) between original and reconstructed image has often been used as a simple, but mathematically tractable metric, but MSE is also known for its less than perfect correlation to human quality perception.
More elaborate metrics address known effects of the human visual system. A first effect of the human visual system often taken into consideration is that the human eye is more sensitive to lower and medium spatial frequencies than to high frequencies. This is often expressed in terms of a contrast-sensitivity function (CSF). A second effect of the visual human system is that, a structure overlayed by a texture is less visible than the structure alone. This effect is generally known as visual masking. A third consideration is that the human eye is more sensitive to luminance changes than to chrominance errors. Other effects are also know but what should be appreciated is that the perceived visual quality of an image may differ significantly from a mathematically based quality metric such as the MSE metric which, while convenient to use, does not take into consideration the full effects of the human visual system.
While JPEG XR provides good statistical results on various image tests such as the MSE metric, there is a need for methods and/or apparatus which allow for image encoding to be implemented using JPEG XR or similar image encoding techniques but in a manner that provides visually superior results to those previously achieved while achieving the same or similar image compression results as have been achieved with the previously used approaches for image encoding.