1. Field
This application relates generally to image compression and, more specifically, to lossy compression of high dynamic range (HDR) video.
2. Related Art
High dynamic range (HDR) images have a wider dynamic range (i.e., the ratio between the brightest and darkest parts of a scene) than low dynamic range (LDR) images. The wider dynamic range allows the preservation of details that may be lost due to limiting contrast ratios. Rendered images of computer animated motion pictures or computer games are often HDR images, because HDR images create more realistic scenes.
In general, HDR images use a higher number of bits per color channel than LDR images to represent many more colors over a much wider dynamic range. Typically, a 16-bit or 32-bit floating point number is used to represent each color component of an HDR pixel.
Typically, HDR images encode as image data the actual physical values of luminance or radiance of the scene being depicted. The encoded image data is meant to describe the state of the image when captured. Such an image is variously described as an input-referred, scene-referred, or focal-plane-referred image. For example, an HDR image capturing the full range of luminance values in a sunlit scene containing sand or snow in full sun and other objects in deep shadow may have a dynamic range exceeding 100,000:1.
In contrast, LDR images are generally output-referred (also known as device-referred) images. LDR images contain data that is dependent on the particular display devices, such as cameras, LCD monitors, and video projectors. For example, a digital single lens reflex 35 mm camera, such as the Canon EOS-1D Mark II, may have a dynamic range of 2048:1, and a computer LCD display may have a dynamic range of 1000:1. Therefore, two LDR images shown using two different display devices may differ, as the data is meant to describe the performance of the particular display device under a certain set of viewing conditions.
Image compression is the application of data compression to a digital image, to reduce the number of bits necessary to represent the image. Image compression may be lossy or lossless. In lossy compression, there is irreversible loss of information during compression encoding, such that it is generally not possible to reconstruct an exact replica of the original uncompressed image after decoding. However, the reconstructed image may be acceptable and useful for many applications, and lossy compression has the advantage of producing potentially greater reduction in data rate than lossless compression.
In recent years, there has been a growing body of work examining the problem of lossy compression of HDR image files. One approach is to convert HDR image data from a floating point format to a limited-range integer format and then compress the converted data using traditional compression techniques, such as MPEG (Moving Picture Experts Group) or JPEG (Joint Photographic Experts Group) compression.
For example, an approach by Mantiuk et al. converts HDR image data from a floating point format to an integer format by using a mapping that accounts for the human visual system's perception of luminance. The converted data is then compressed using MPEG-based compression. (See, e.g., Rafa Mantiuk, Alexander Efremov, Karol Myszkowski, Hans-Peter Seidel, “Backward compatible high dynamic range MPEG video compression,” ACM Transactions on Graphics, v.25 n.3 (July 2006); Rafal Mantiuk, Grzegorz Krawczyk, Karol Myszkowski, Hans-Peter Seidel, “Perception-motivated high dynamic range video encoding,” ACM Transactions on Graphics, v.23 n.3 (August 2004); Rfal Mantiuk, Karol Myszkowski, and Hans-Peter Seidel, “Lossy compression of high dynamic range images and video,” Proc. of Human Vision and Electronic Imaging XI, vol. 6057 of Proceedings of SPIE, 60570V (February 2006).) A similar approach by Xu et al. uses JPEG 2000 compression. (See Ruifeng Xu, Sumanta N. Pattanaik, Charles E. Hughes, “High-Dynamic-Range Still-Image Encoding in JPEG 2000,” IEEE Computer Graphics and Applications 25(6): 57-64 (2005).) These approaches have a number of drawbacks. For MPEG or JPEG compression, the bit depth per color channel (the number of bits used to represent a color component of a pixel) is limited; therefore, precision is traded off against dynamic range. In addition, the perceptual mappings proposed by Mantiuk et al. are insufficient for storing high-quality output-referred images. For JPEG 2000 compression, higher bit depth compression is available, but JPEG 2000 is a relatively complex implementation and has the disadvantage of being slow to run. In both cases, compressing negative numbers is a challenge without storing additional information.
An alternative approach is to extract an LDR image from the source HDR image and include additional data for recovering the original image. The LDR image is then compressed using traditional techniques, and the recovery information is compressed separately. This approach also has a number of drawbacks.
For example, Ward and Simmons proposed a variant of this strategy that is backwards compatible with LDR JPEG decoders. The LDR image is extracted from the source HDR image by a tone-mapping operation, and the additional data for recovering the original image is extracted and stored in auxiliary locations of a JPEG file. The disadvantage of this approach is the overhead of performing the tone-mapping operation during the encoding stage. (See, e.g., Greg Ward, Maryann Simmons, “Subband encoding of high dynamic range imagery,” APGV 2004: 83-90; Greg Ward, Maryann Simmons, “JPEG-HDR: A backwards-compatible, high dynamic range extension to JPEG,” Thirteenth Color Image Conference (November 2005).)
In another example, an approach by Xu first represents the HDR image in an RGBE (Red, Green, Blue, and Exponent) format, which stores pixels as an RGB (red, green, and blue) tri-value, one byte per color component, with a one byte shared exponent E. The LDR image is then extracted from the source HDR image by treating the RGB as the LDR portion. The exponent is then compressed separately. The approach has the disadvantage of imprecision which results from the use of RGBE encoding. (See Ruifeng Xu, Real-Time Realistic Rendering and High Dynamic Range Image Display and Compression, Ph.D. thesis, School of Computer Science, University of Central Florida (December 2005).)
In yet another strategy, an approach by Manders et al. compresses the low-byte of the floating point values in the HDR image directly. However, this approach offers limited control of the compression ratio and the trade-off between the compression rate and image quality. (See Corey Manders, Steve Mann, and Farzam Farbiz, “A Compression Method for Arbitrary Precision Floating-Point Images,” Proceedings of the 2007 International Conference on Image Processing (2007).)