The quality of digital images has vastly improved over the past few years. The number of pixels in a picture has increased due to improvements in both cameras and displays. For example, the resolution of commercial flat panel displays used for television has increased step by step from 640×480 pixels VGA resolution to full high definition having 1920×1080 pixels. This has already brought about a change in the amount of data that needs to be handled to display an image. Furthermore, not only has the spatial resolution been improved, but also the dynamic resolution. For a black and white picture this would mean that a single pixel could re-present any of a large number of different shades of grey instead of only being black or white. The overall dynamic range easily compares to the human eye. A human being can distinguish objects both in starlight during night time or in bright sun light, even though on a moonless night objects receive only approximately 1/1,000,000,000 of the illumination they would on a bright sunny day: this corresponds to a dynamic range of 90 db. However, the eye needs time to adjust to different light levels. Thus, the dynamic range of the human eye without adjustment of the pupil is only approximately 30 db. In contrast, the differences between very dark and very light spots in a picture taken by a modern camera system can easily be larger than those the human eye can distinguish without adaptation; with a modern camera system it is thus possible to determine fine details in very dark spots even while very bright spots are also present in a picture. The dynamic range of a modern camera system may also easily surpass the dynamic range of a conventional display.
It should also be noted that not only the dynamic range of a picture to be produced or reproduced needs amendment but also a correction is necessary due to numerous technological limitations such as limited color gamut and contrast, limited spatial resolution, usually limited field of view and nontrivial workarounds to achieve stereo capacity. Furthermore, in order to reproduce the correct appearance, it is often necessary to simulate the behavior of the human visual system. Here, it should be noted that the viewing conditions of an observer, observing either the scene or a display may be completely different. Tone mapping is very helpful in the production of realistic images and several operators have been proposed. However, it is still prohibitive to have a complex tone mapping operator that produces high quality results in real time due to restriction of the data processing capacity, in particular for large high definition pictures. Two main tone mapping operator classes exist, namely tone reproduction curves (TRCs) and tone reproduction operators (TROs). Both will be referred to as tone mapping functions in the present application.
TRC algorithms are efficient because the operation is applied to pixels independently and thus can be performed in parallel using a simple look-up table. In addition, models also exist that are able to capture some important aspects, such as visual adaptation. However, TRCs fail to capture the important information on local contrast that could be represented in the spatial context of neighboring image pixels which is of great importance for the human visual system.
The algorithms based on TROs when compared with TRCs are able to capture the important information on local contrast. Unfortunately, they typically introduce artefacts in some parts of the image, such as dark halos, and are computationally more demanding.
Now, images that have a high dynamic range have a higher pixel depth than the more conventional low dynamic range images. The increased bit depth per pixel presented by high dynamic range (HDR) image formats can account for all the dynamic range visible by the human visual system. Both the increased number of pixels in a high definition (HD) picture and the improved dynamic information in a HDR picture increase the amount of data relating to a picture.
This poses severe technical problems. The data size gives rise to problems when transmitting and/or storing image data. These problems become more severe when video data consisting of a plurality of frames need to be handled instead of single digital images. For high dynamic range still pictures, the above-mentioned tone mapping reduces the dynamic range of a picture in a way that attempts to allow the user to observe all details relevant to the human eye even if this would not be possible in a conventional scene and/or adaptation. Reducing the dynamic range may also be used to reduce the amount of data necessary to fully describe the picture. Furthermore, for video streams consisting of a sequence of digital images (frames), methods that allow reduction, i.e. compression, of data have been described such as the MPEG-standard.
However, despite the fact that both the high definition as well as high dynamic range information may be compressed, the results when applying such conventional compression schemes to a stream of video data are still not satisfying for a user.
High Dynamic Range Images (HDRIs) consume a considerable amount of memory, and efficient compression methods are not easy to implement. A typical low dynamic range image consists of 24 bit-per-pixel (bpp) RGB data that can be compressed considerably. On the other hand, an uncompressed HDR image typically uses 96 bpp for an RGB image. Accordingly, an uncompressed HDR image consumes four times the memory of an uncompressed, low dynamic range (LDR) image.
The first attempt to compress the HDR data was introduced by G. WARD in REAL PIXELS, Graphics Gems, 2:15-31, 1991. It has been suggested that the 96 bpp image is compressed to 32 bpp using a light bit mantissa for each channel and a shared light bit exponent. The resulting RGBE representation however does not cover the full visible color gamut since it does not allow for negative values. Other formats have been suggested for example in G. W. LARSON: LOGLUV ENCODING FOR FULL-GAMUT, HIGH DYNAMIC RANGE IMAGES, Journal of Graphics Tools, 3(1):15-31, 1998. Further suggestions, using different data formats can be found in INDUSTRIAL LIGHT & MAGIC, OpenEXR., http://www.openexr.org, 2002 and G. McTAGGERT, C. GREEN and J. MITCHELL: HIGH DYNAMIC RANGE RENDERING IN VALVE'S SOURCE ENGINE, in SIGGRAPH '06: ACM SIGGRAPH 2006 Courses, page 7, New York, N.Y., USA, 2006, ACM Press. Furthermore, several extension to standard compression algorithms have been presented such as for example in G. WARD: JPEG-HDR: A BACKWARDS-COMPATIBLE, HIGH DYNAMIC RANGE EXTENSION TO JPEG, in: CIC 13th: Proceedings of the Thirteenth Color Imaging Conference, The Society for Imaging Science and Technology, 2005; G. WARD: A GENERAL APPROACH TO BACKWARDS-COMPATIBLE DELIVERY OF HIGH DYNAMIC RANGE IMAGES AND VIDEO in: CIC 14th: Proceedings of the Fourteenth Color Imaging Conference, The Society for Imaging Science and Technology, 2006; G. WARD and M. SIMMONS: SUB BAND ENCODING OF HIGH DYNAMIC RANGE IMAGERY in: APGV '04: Proceedings of the 1st Symposium on Applied Perception in Graphics and Visualization, pages 83-90, New York, N.Y., USA, 2004, ACM Press.
Here, backwards-compatible HDR-JPEG extends the JPEG-standard keeping retro-compatibility. Firstly, an HDR-image is tone mapped using a standard tone reproduction curve and stored as a normal JPEG. Secondly, a sub-band corresponding to HDR information is stored in the “application markers” of the standard for a maximum of 64 Kbytes which is a constraint for encoding high resolution HDR images. HDR-JPEG 2000 is an extension to JPEG 2000 that exploits the support of the standard for 16 bit integer data. With this method, HDR data is transformed into the logarithmic domain, quantized into integers and compressed using standard JPEG 2000 encoding.
While the suggested compression schemes known in the art are helpful for still images, they leave room for improvement and are hardly applicable for HDR videos.
There are two important applications for video data streams providing HDR video data. The first application is movie pictures such as produced by video cameras; the second is graphic applications such as in video games and so forth. Here, there is a particular need to compress the high dynamic range of textures. HDR textures are typically used in interactive applications; due to the size of HDR images they need to be compressed. Methods related to high dynamic range texture compression can either be designed for general purpose graphics hardware or custom made processors that allow decoding of the HDR textures in real time.
The problem of HDR texture compression has been addressed by numerous authors, compare e.g. J. MUNKBERG, P. CLARBERG, J. HASSELGREN, and T. AKENINE-MÖLLER: HIGH DYNAMIC RANGE TEXTURE COMPRESSION FOR GRAPHICS HARDWARE in: ACM Trans. Graph., 25(3):698-706, 2006; K. ROIMELA, T. AARNIO and J. ITÄRANTA: HIGH DYNAMIC RANGE TEXTURE COMPRESSION in: ACM Trans. Graph., 25(3):707-712, 2006; K. ROIMELA, T. AARNIO and J. ITÄRANTA: EFFICIENT HIGH DYNAMIC RANGE TEXTURE COMPRESSION in: S13D '08: Proceedings of the 2008 Symposium on Interactive 3D Graphics and Games, pages 207-214, ACM Press, New York, N.Y., USA, 2008. One problem of the methods described in the prior art is that they require complex hardware.
For HDR videos, HDR-MPEG compression schemes have been proposed in: R. MANTIUK, A. EFREMOV, K. MYSZKOWSKI, and H.-P. SEIDEL: BACKWARD COMPATIBLE HIGH DYNAMIC RANGE MPEG VIDEO COMPRESSION, ACM Trans. Graph., 25(3):713-723, 2006;
R. MANTIUK, G. KRAWCZYK, K. MYSZKOWSKI, and H.-P. SEIDEL: PERCEPTION-MOTIVATED HIGH DYNAMIC RANGE VIDEO ENCODING, ACM Trans. Graph., 23(3):733-741, 2004. The suggested schemes can be used as an extension to MPEG-4. As HDR-JPEG backwards-compatible videos are tone mapped, for each frame a reconstruction function is calculated when storing the HDR data. To improve quality, residuals of frames are saved in the video stream. While these algorithms present high quality and high compression ratios, they are not ideally suitable for real time applications since their lack of hardware support results in complex implementations, particularly due to the complex fetching mechanisms required for decoding.
A further approach has been adopted by L. Wang, X. Wang, P-P Sloan, L-Y Wei, X. Tong and B. Guo: Rendering from compressed high dynamic range textures on programmable graphics hardware, I3D '07: Proceedings of the 2007 symposium on Interactive 3D graphics and games, 17-24, ACM Press, New York, 2007.
In this paper it has been suggested to separate HDR and LDR parts of the images and to quantize two 8-bit textures compressed using S3TC with their residuals. The reconstruction (decoding) was performed by combining HDR and LDR parts using a simple shader. A more general compression scheme has recently been proposed in S. LEFEBVRE and H. HOPPE: COMPRESSED RANDOM-ACCESS TREES FOR SPATIALLY COHERENT DATA in: Rendering techniques (Proceedings of the Eurographics Symposium on Rendering), Eurographics, 2007. The method described therein relies on a hierarchical data structure that represents spatially coherent graphics data. Despite the good compression the shader is complex and about twenty times slower than a fetch to a compressed texture using S3TC.
It should be noted that hardware solutions for RGBE filtering and so forth have been suggested as well, compare M. KILGARD, P. BROWN and J. LEECH: GLEXT TEXTURE SHARED EXPONENT in: OpenGL Extension, http://www.opengl.org/registry/specs/EXT/texture_shared_exponent.txt, 2007 as well as in D. BLYTHE: THE DIRECT3D 10 SYSTEM in: ACM Trans. Graph., 25(3):724-734, 2006.
It has also already been suggested to use inverse tone mapping for compression. Here, a multi-scale image processing technique for both tone mapping and companding has been proposed by Y. LI, L. SHARAN and E. H. ADELSON: COMPRESSING AND COMPANDING HIGH DYNAMIC RANGE IMAGES WITH SUB BAND ARCHITECTURES in: SIGGRAPH '05: ACM SIGGRAPH 2005 Papers, pages 836-844, New York, N.Y., USA, 2005, ACM Press. However, the operation is not efficient on current hardware given the fact that the compressed low dynamic range has to be decomposed into sub bands. A compression method based on an inverse tone mapping operator and JPEG was presented in M. OKUDA and N. ADAMI: RAW IMAGE ENCODING BASED ON POLYNOMIAL APPROXIMATION, IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, pages 15-31, 2007. The tone mapping operators based on the Hill functionals for inverse tone mapping are calculated using minimization techniques and then encoded using JPEG. The residuals are calculated for increased quality and are compressed using wavelets. However, wavelets and DCT decompression are computational expensive to evaluate and do not provide a constant decompression time which is particularly disadvantageous in real time critical applications.
A compression scheme for still images and videos using a tone mapping operator based on a model of a human cones was presented in J. H. V. HATEREN: ENCODING OF HIGH DYNAMIC RANGE VIDEO WITH A MODEL OF HUMAN CONES, ACM Trans. Graph., 25(4):1380-1399, 2006.
The above discussion shows that despite numerous attempts at compressing high dynamic range images, the results achieved thus far are not completely satisfying in view of compression efficiency, quality of result images and computational load. These problems increase when streams of video frame data need to be handled. As long as the stream of video data corresponds to a slide show of only still images, it will be obvious that a tone mapping function used for compression can be altered whenever the still image (and thus in a context of a video stream the scene) changes. However, in a conventional video stream, there are a number of cases where the scene does not change abruptly, for example in cases where a panning, tilting or zooming in of a camera occurs and/or one or a plurality of objects move within a given picture and/or the illumination changes, for example in cases where a room is shown where lights are switched on or off. In cases like this, it is necessary to provide both an efficient compression as well as a compression that does not lead to severe artefacts in a reproduced video.