1. Field of the Invention
This invention relates to the field of image and video processing, in particular to capture, compression, storage, transmission, editing, decompression, and display of digital images and video.
2. Background Art
2.1. RGB Space
At the point of display, digital color images, whether originally from computer-generated imagery (CGI), digital video, redigitized analog video, digital photography, or digitized film, consist of a rectangular array of picture elements (pixels), each pixel whereof superposes three spectral primary color intensity components—nominally red, green, and blue (RGB)—, each of which components is quantized to a fixed precision. In standard high-quality displays, the precision is almost always 8 bits per component, which is nearly enough to represent all perceptibly distinct tones within the RGB gamut of most color display technologies, such as the cathode ray tube, liquid crystal display, and plasma display. These R, G, and B stimulus values are mapped within the display by a nonlinear transfer function from gamma-corrected intensity components R′, G′, and B′ stored in display memory. Thus the set of possible pixel values spans a finite discrete three-dimensional color-intensity space, R′G′B′ space. Colors outside this finite gamut cannot be displayed, and colors representing finer spectral or tonal distinctions than this discrete gamut cannot be distinguished.
At the point of capture, too, color video and film images are represented in a spectral color space, most often {red, green, blue} or its complement, {cyan, magenta, yellow} (CMY), whether the images are retained in this form or converted to a different color space for analog video. These spectral stimulus values are mapped within the camera by a nonlinear transfer function to gamma-corrected intensity components R′, G′, and B′. In specialized cases, some or all of the captured spectral components may represent infrared or ultraviolet frequencies outside the RGB range. Other specialized cases, termed multispectral or hyperspectral, include more than three spectral components. But all of these are ultimately rendered to RGB space for display.
The reason for this ubiquitous RGB representation in image capture and especially display is its correspondence to human perception. The human retina contains three types of cone-shaped light receptors, sensitive to three different broadly overlapping regions of the light spectrum, nominally red (R), green (G), and blue (B). Because the cones in a human retina contain only three different photoreceptive pigments, it is possible, by mixing just three primary colors, one in each of these spectral regions, to produce most colors distinguishable by humans—or even, if negative proportions are allowed, all humanly distinguishable colors. This parsimonious representability in terms of three primary colors is very important for minimizing bandwidth and storage capacity requirements.
During editing, the R′G′B′ images are sometimes represented at higher precision, often 10 or 16 bits per component, to minimize accumulation of error. And often images are captured at 10 or more bits per component to leave enough play for changing lighting conditions. Some high-end systems are even capable of displaying the R′G′B′ images at a precision of 10 or more bits per channel.
2.2. Y′CBCR Space
Despite the fact that color images are universally captured and displayed in a spectral color space, nonspectral color spaces known as luma-chroma (Y′CBCR) or luminance-chrominance spaces are often used for storage and transmission—particularly for television broadcast and recording, but also for still photography, computer-generated imagery, and digitized film. The luma or Luminance dimension (r) represents brightness, while the chroma dimensions (CB and CR) together represent hue and saturation. The polar topology of hue and saturation, however, renders these dimensions unsuitable for hardware and software implementations. In R′G′B′ space, hue corresponds to the angle about the gray (R′=G′=B′) line, while saturation corresponds to the distance from the gray line (FIG. 25). Thus chroma is represented instead by a pair of arbitrary Cartesian dimensions, ideally perpendicular to the gray line. These Cartesian dimensions, generally red . . . cyan (CR) and blue . . . yellow (CR), represent quite unintuitive opponent-color gamuts that range from a spectral hue through gray to the corresponding antihue, or complementary color. In a digital representation, each of the luma and chroma components is represented at each pixel to a finite precision, typically 8 or 10 bits, although the chroma components are often spatially sampled at a lower rate than luma.
The justification for storage and transmission of images in luma-chroma space is again the correspondence to human perception. The sensitivity of the human visual system to differences in intensity is highly nonuniform among different colors. The unequal sensitivity of the red, green, and blue cone types perceptually distorts the R′G′B′ space. Green cones are about twice as sensitive to brightness as red cones, and red cones are about three times as sensitive as blue cones. Because of the differential sensitivity of the different cone pigments, blue intensity can be quantized three times as coarsely as red, and red intensity can in turn be quantized twice as coarsely as green, with little perceptible effect; in analog terms, the blue component can be assigned a third the bandwidth of the red component, which in turn needs only half the bandwidth of the green component. Furthermore, the human retina also contains rod-shaped photoreceptors, in even greater number than cones, which do not distinguish color, but are more sensitive to brightness. Because of this discrepancy in spatial and intensity resolution, the chroma of an image can be quantized more coarsely than the luma, or assigned a smaller bandwidth than the luma, with little perceptible effect. Again, the economy of the Y′CBCR representation is very important for the potential savings in bandwidth and storage capacity.
Historically, the green>red>blue bias of the human brightness percept is reflected in the sepia bias of monochrome video and photography. Similarly, the luma>chroma bias is reflected in the mere fact that monochrome photography and videography preceded color versions of those technologies, as well as in the relatively small bandwidth allocated to chroma relative to luma in storage and transmission formats. The digital Y′CBCR luma-chroma spaces are modelled after and designed for ease of interchange with the color spaces used in international color television standards, such as Y′IQ of the North-American color-television standard NTSC, and Y′UV of the European color-television standards PAL and SECAM. The digital Y′CBCR spaces differ from these analog spaces chiefly in that, for ease of computation, the chroma axes (CB and CR) are not quite perpendicular to the luma axis (Y′).
The Y′CBCR representation is used today in most popular digital color image formats, including the lossy still-image compression standards JPEG-DCT and PhotoYCC, and the current lossy moving-image compression standards D-5, D-1, Digital Betacam, DV, Motion-JPEG, Photo JPEG, MPEG-2, H.263, and H.264. Most lossy color-image compressors take advantage of the greater perceptual relevance of Y′CBCR space, converting the spectral pixel values to a perceptually more-uniform space at the beginning of the compression phase (FIG. 2), performing the bulk of the computation in the perceptual space, and converting the pixels back to display-color space at the end of the decompression phase (FIG. 3). When operating on the image in a perceptually uniform space, a uniform computational error range or quantization error range guarantees that the peak perceptual error is no greater than the average perceptual error, making it possible to quantize the image much more coarsely (and hence compress it further) for a given image quality. JPEG-DCT was internationally adopted in 1994 as part of ISO/IEC DIS 10918-1. The most popular standard relating R′G′B′ to Y′CBCR is given in Recommendation ITU-R BT.601, adopted in 1990.
Until recently, top-quality Y′CBCR recordings used 10-bit channels, as in the D-5 tape format, in order to achieve the same tonal precision as 8-bit RGB channels. Now some cameras provide even higher precision in order to leave enough play for varying lighting conditions.
2.3. Color-Space Conversion
Because color images are initially captured and ultimately displayed in RGB space but often stored and transmitted in Y′CBCR space, the preservation of image quality demands that as little information as possible be lost in the interconversions between these two color spaces. The advent of digital video editing systems reinforces this motive, since many digital editing effects are most conveniently applied or only available at all in R′G′B′ space, necessitating multiple conversions back and forth between the two color spaces, and thus entailing further information loss and image degradation on each successive editing generation. In the trade literature, the interconversion between digital R′G′B′ and Y′CBCR representations is assumed to be inherently lossy [Izraelevitz & Koslov 1982]. Moreover, in actual implementation in prior art, despite the care often taken to reduce the information loss, these interconversions are indeed always lossy. As a result, in both the conversion from display-color space to perceptual space and the reverse, information is lost and the image quality is degraded.
The conversion from an R′G′B′ pixel value to a luma value (Y′) shrinks the red and especially the blue components relative to the green component.Y′←λR×R′+λG×G′+λB×B′
The ideal scaling factors (λR,λG,λB) depend on the specific RGB primaries used, and several standards are in currency. In Composite NTSC (SMPTE 170M-1994), Composite PAL (ITU-R BT.470-4 System B), 720×483 progressive 16:9 (SMPTE 293M-1996), Digital 525 (SMPTE 125M-1995 (4:3 parallel), SMPTE 259M-1997 (serial), and SMPTE 267M-1995 (16:9 parallel)), and Digital 625 (ITU-R BT.470-4 System G), the luma coefficients are defined as:λR=0.298912=˜0.299λG=0.586611=˜0.587λB=0.114478=˜0.114In practice, these coefficients are generally rounded to three decimal places. This set of luma coefficients is also standardized in ITU-R BT.601-4 and used in standard digital video tape formats including SMPTE D-1 and SMPTE D-5, standard digital video links such as Serial Digital Interface (SMPFE 259M-1997), and most digital image compression schemes, including DV, WPEG, MPEG-2, H.263, and most other digital color-image compression schemes.
In 1920×1035 HDTV (SMPTE 240M-1995 and SMPTE 260M-1992) and the 1920×1080 HDTV interim color implementation (SMPTE 274M-1995), the luma coefficients are defined as:λR=0.212λG=0.701λB=0.087
And in 1920×1080 HDTV (SMPTE 274M-1995), 1280×720 HDTV (SMPTE 296M-1997), and 1125 60/2:1 (ITU-R BT.709-2), the luma coefficients are defined to be:λR=0.2126λG=0.7152λB=0.0722
In any case, the luma coefficients are defined to sum to unity, giving luma a nominal range of [0 . . . 1], just as for the R′G′B′ components. Frequently, however, luma is defined to have headroom and footroom for superwhite and subblack, as described below.
The chroma dimensions in all these standards are a blue . . . yellow opponent (CB), and a red . . . cyan opponent (CR), which are defined in terms of the luma dimension as:CB←(B′−Y′)/(2−2×λB)CR←(R′−Y′)/(2−2×λR)The chroma dimensions are scaled to have a range of [−½ . . . ½], again usually leaving some headroom and footroom for filter overshoot and undershoot.
In practice, the chroma functions are generally expanded and combined with the luma function to yield a simple matrix multiplication M×R′G′B′→Y′CBCR (FIG. 4). This form is especially simple to implement and efficient for vector processors. For Rec. 601, for example:Y′←0.299×R′+0.587×G′+0.114×B′CB←−0.169×R′−0.331×G′+0.500×B′CR←0.500×R′−0.418×G′−0.082×B′The inverse conversion from Y′CBCR to R′G′B′ is given by inverting this matrix, M−1×Y′CBCR→R′G′B′ (FIG. 5). Again, for Rec. 601:R′←1.000×Y′+0.000×CB+1.402×CR G′←1.000×Y′−0.346×CB−0.714×CR B′←1.000×Y′+1.771×CB+0.000×CR 
For speed-conscious implementations on sequential processors, it is possible factor out redundant steps and eliminate the multiplications by one and zero. Thus the forward conversion can be reduced from 9 multiplications and 6 additions to 5 multiplications and 4 additions; and the inverse conversion is reduced to 4 multiplications and 4 additions.Y′←0.299×R′+0.587×G′+0.114×B′CB←0.564×(B′−Y′)CR←0.564×(R′−Y′)R′←Y′+1.402×CR G′←Y′−0.346×CB−0.714×CR B′←Y′+1.771×CB 
Since the Y′CBCR pixels are represented in fixed point, generally to the same precision as the RGB pixels, the most frugal implementations carry out the entire luma-chroma conversion (FIG. 6) and spectral conversion (FIG. 7) in fixed point, to avoid conversions to and from floating-point representation. In a fixed-point implementation, the luma conversion can be reduced to 3 multiplications, 2 additions, and 1 normalization (FIG. 8), and the blue . . . yellow and red . . . cyan chroma conversions can be reduced to 1 multiplication, 1 subtraction, and 1 normalization each (FIG. 9, 10). In the inverse direction, in a fixed-point spectral conversion, the red and blue conversions can be reduced to 1 multiplication, 1 addition, and 1 normalization each (FIG. 11, 12), and the green conversion can be reduced to 2 multiplications, 1 addition and 1 subtraction, and 1 normalization (FIG. 13). In a fixed-point implementation, if the unit u (803) for the multipliers (801, 903, 1102) is chosen to be a power of two or nearly so, then the normalization (804, 1104) can be implemented as a right-shift rather than a divide. Where fixed-point multiplication by a constant is slower than a table lookup, all the multiplications (801, 903, 1102) can be implemented with one-dimensional tables filled during an initialization phase.
In prior art, the luma-chroma transformation process is bidirectionally destructive. In other words, in an R′G′B′→Y′CBCR→R′G′B′ workflow, information is lost both on conversion to Y′CBCR space and again on conversion back to R′G′B′ space. Similarly, in a Y′CBCR→R′G′B′→Y′CBCR workflow, information is lost both on conversion to R′G′B′ space and on conversion back to Y′CBCR space. Considering that the inverse of a fixed-point (i.e. integer) matrix is a rational matrix, this lossiness would seem to be inevitable.
Prior implementations sometimes adjust the matrix elements to reduce the reversibility error. Even so, simple geometric analysis reveals that, at best, such efforts will still result in more than ¾ of all possible R′G′B′ pixels emerging incorrect when converted to Y′CBCR and back. Specifically, note that the R′G′B′={0,0,0} origin maps to the Y′CBCR={0,0,0} origin; the unit red axis R′G′B′={1,0,0} maps to Y′CBCR={0.2989,−0.169,0.5}, with a length of ˜0.607; the unit green axis R′G′B′={0,1,0} maps to Y′CBCR={0.5866,−0.331,−0.419}, with a length of ˜0.794; and the unit blue axis R′G′B′={0,0,1} maps to Y′CBCR={0.1145,0.5,−0.081}, with a length of ˜0.519. Thus if the luma-chroma transformation were isogonal, the volume of the R′G′B′ cube in Y′CBCR space would be ˜0.607×˜0.794×˜0.519=˜0.250. But since the transformation is skewed for all standard Y′CBCR spaces, the actual volume is even smaller.