Ever since the 19th century, additive color reproductions have been represented in an RGB space of driving coordinates for generating red, green and blue primary light outputs. Because giving these different primaries different strengths (luminances) is the way to make all colors within the so-called gamut (the diamond shape obtained by the three vectors defined by the maximum driving possible e.g. Rmax) corresponding to the primaries in some generic color space like XYZ). Or similarly one can define such colors in another linear space derived from the primaries (e.g. XYZ, or UVW). This is done by linear combination of the vectors, i.e. one can calculate the new color coordinates by multiplying the old ones in the other color space definition by multiplying with a conversion matrix.
Now it is very useful, and was historically necessary for black-and-white television, to have an achromatic direction which only encodes the luminances Y, since also the visual system has a separate processing channel for this. This is obtained by putting the gamut on its tip, which is black, represented in FIG. 1a by the black dot. The gamut of a color representation space, when tied to a reference monitor (or any monitor the signal is sent to if the reference is undefined) is gamut 101. In this same philosophy one could also imagine theoretical primaries which can become infinitely bright, leading to a cone shape 102. Several color spaces are defined according to this principle, especially the closed ones, since they are also useful for painting, where one must mix pure colors with whites and blacks, and can go no higher than paper white (e.g. Munsell color tree, NCS and Coloroid are examples of such a (bi)conal color space, and CIELUV and CIELAB are open cones).
In the television world and video encoding thereof, a specific set of color spaces around this philosophy emerged. Because CRTs had a gamma which amounted to the outputted luminance being approximately the square of the input driving voltage (and the same for the separate color channels), it was decided to precompensate for this and send signals to the television receivers which were defined as approximately square roots of the linear camera signals (i.e. e.g. R′ being the square root of R, the amount of red in the scene as captured by a camera, and within a range of e.g. [0,0.7 Volt]). Now because one needed to build on top of the existing black and white transmission system (NTSC or PAL), one also made use of this philosophy of using an achromatic (“black-and-white”) coordinate, and two color-information carrying signals R−Y, B−Y (from which G−Y could then be derived). Y in a linear system would be calculable as a*R+b*G+c*B, in which a, b and c are constants dependent on the primaries.
However, one did these simple matrixing calculations in the non-linear space of the derived coordinates R′, G′, B′ (i.e. the square rooted signals). Although the diamond shape of the maximum possible gamut doesn't change by such a mathematical operation, the position/definition of all colors within it does. This means inter alia that Y′=a*R′+b*G′+c*B′ is no longer a real luminance signal conveying the exact luminance of all colors, which is why it is called a luma (we will in this text use the word luma for all derived/redefined signals along the achromatic axis which are not linear luminance, i.e. irrespective of what mapping function is used, i.e. not necessarily a square root but any function Y-to-Y′ one likes; and we will then see Y′ as a technical encoding representing a luminance Y of a color). This is the so-called constant luminance problem, since some luminance information is not in the Y′ but rather in the chromatic coordinates Cr, Cb. These are defined as Cr=m*(R′−Y′) and Cb=n*(B′−Y′), and in this text we will call them chrominances because they grow larger with increasing luminance of a color (the term chroma also being used). So these coordinates do have some chromatic aspect to them, but also this is mixed with a brightness aspect (psychovisually this is not per se bad because colorfulness is also an appearance factor which grows with brightness). The problem would not be so bad if one did exactly the same inverse decoding, but any transformation on the colors encoded in such a system (which also forms the basis of current MPEG standards) creates problems like e.g. luminance and color errors. This occurs e.g. when one subsamples the chrominances to a lower resolution, and one definitely should avoid doing color grading in such spaces as the results can be all over the place (although some image processing software does work in such spaces). So this is not the most convenient color space to represent colors, since it has problems one had to live with. Another problem is that the coordinates can grow quite large requiring many bits for encoding if Rmax etc. is very large (or in other words, chrominance spaces need many bits to be able to still have enough precision for the very small chrominance values), as with HDR signals, although that can be partially mitigated by defining strong non-linear luma curves defining R′ from R etc. A recent example of such a coding space presented to SMPTE is YDzDx color space, which may need at least 10 bits or preferably more (12 bits) for good (wide gamut yet precise) color encoding, and such large words are seen less convenient by the hardware manufacturers.
A second type of color space topologies (FIG. 1b) emerged, of which there are less variants though. If we project the linear colors to a unit plane 105, we get perspective transformations of the type x=X/(X+Y+Z) and y=Y/(X+Y+Z) (and the same for e.g. CIELUV). Since then z=1−x−y, we need only two such chromaticity coordinates. The advantage of such a space is that it transforms the cone into a finite-width cylinder. I.e., one can associate a single chromaticity (x,y) or (u,v) with an object of a particular spectral reflection curve illuminated by some light, and this value is then independent of the luminance Y, i.e. it defines the color of an object irrespective of how much light falls on it. Such a color is then commonly described with dominant wavelength and purity, or the more human quantities hue and saturation. The maximum saturation for any possible hue are the monochromatic colors forming the horseshoe boundary 103, and the maximum saturation for each hue of a particular additive display (or color space) is determined by the RGB triangle. In fact, the 3D view is needed, because the gamut 104 of an additive reproduction or color space is tent-shaped, with peak white W being the condition in which all colors channels (i.e. the local pixels in a RGB local display subpixel triplet) are maximally driven.
The chrominance-based color spaces, for television/video being descendants of NTSC, BT.601 and BT. 709, e.g. the Y′CrCb of the various MPEG and other digital compression standards, have been sufficiently good in practice, although there were several known issues, in particular the mixing of the various color channels due to the inappropriate non-linearities (e.g. luminance changes if some operation is done on a color component, or hue changes when one only wanted to change saturation (or better chroma), etc.). The chromaticity-based color spaces, like Yxy or Lu′v′, have never been used for image transmission, only for scientific image analysis.
In particular, R. Mantiuk et al: “Lossy compression of high dynamic range images and video” Proc. SPIE-IS&T Electronic imaging Vol. 6057, 16 Jan. 2006, pages 1-10, deals with finding a color space for lossy encoding of a HDR image or video. In particular they designed a scene-referred encoding which can handle all luminances between moonless sky 10 exp(−5) nit and the surface of the sun 10 billion nit. This can clearly not be handled with classical CIE 1976 Luv space, which was designed to handle typical reflective colors of say between 100% reflective white being a couple of hundredths of nits and some 0.5% black, i.e. LDR image content. They define a new log-type luma axis for a color space in which the luma tries to closely follow the particulars of human vision and therefore has a first linear part below a first threshold, then a power low behavior, and above a second threshold a logarithmic behavior. The log L-uv color model based thereupon is an example of a topologically cylindrically-shaped chromaticity representation.
WO 2010/104624 also defines a similar Log-type luma, but now of a pure log character, which can encode pragmatic luminances up to 10000 nit. They make a color space from this by defining uv chromaticities in equations 3A and 3B in par. [0087], i.e. this makes the color space also cylindrical.
Larson G. W: “Log Luv encoding for full-gamut, high-dynamic range images”, Journal of graphics tools, association for computing machinery, vol. 3, no. 1, 22 Jan. 1999, pages 15-31, also describes an encoding for HDR still images. It again uses a logarithmic definition of a luma, so that a high dynamic range of luminances can be encoded with 16 bits of a pixel color word, and the color chromaticities (eqs. 3a & 3b) are encoded with 8 bits each. So the shape of this color space is again merely a cylinder, with a logarithmic luma axis. The resulting encoded image is then output in the TIFF format.
Masahiro Okuda and Nicola Adami: “Effective color space representation for wavelet based compression of HDR images”, 14TH International conference on image analysis and processing (ICIAP), 13-17 Sep. 2007, again proposes to use this Log Luv encoding of Greg Ward, but now in a wavelet framework used in JPEG2000.
Recently a desire emerged to start encoding high dynamic range (HDR) video material. These are video images encoded to be rendered preferably on displays with a peak white of at least 1000 nit, and typically interesting images are those which also contain objects over a large span of brightnesses. E.g. a scene which contains both indoors and sunny outdoors objects may have an intra-picture luminance contrast ratio of above 1000:1 and up to 10,000, since black may typically reflect 5% and even 0.5% of fully reflecting white, and depending on indoors geometry (e.g. a long corridor largely shielded from the outdoors illumination and hence only indirectly illuminated) indoors illuminance is typically k* 1/100th of outdoors illuminance. Also in night scenes, objects illuminated by e.g. 20 lux street lighting may encode as far lower luminances in the camera pixels than e.g. lamps. There is a desire to render such scenes with high quality, so that indeed the outdoors sunny part of the video images seems to show relatively realistically looking sunlight, and the lamps should be glowing on the HDR display, hence there is also a desire to encode all these pixel luminances faithfully (and preferably even more useful metadata about the scene, or the artistic grading of it). For still pictures codecs were developed which encode the linear color coordinates, but where this can be done for a single still, for video the speed and hardware considerations (whether e.g. the cost of a processing IC, or the space on a BD disk) don't allow or at least dissuade from using such encodings, i.e. we need different ones, which are more pragmatic regarding the technical limitations.
Given the more complex constraints we have in HDR encoding, the prior art color spaces are not optimal anymore, in particular behavior for the darker parts of the image (in HDR a popular scene being a dark basement with bright lights, but in any case there will be statistically a larger amount of significant pixels in a lower part of the luminance range than for LDR—classical low dynamic range—images) is not optimal. Also, since for HDR we want to have liberal control over the luma code allocation functions (which define the mapping of captured or graded luminances Y to a code Y′ representing them, see e.g. WO2012/147022), the more severely non-linear nature compared to the square root of Y′CrCb would make the erroneous behavior of the in television encoding typically used chrominances spaces like the exemplary one of FIG. 1a behave highly inappropriate. E.g. this would occur when spatially subsampling the color signals from 4:4:4 to 4:2:0, but also for many other reasons which have to do with changing a color coordinate.
Hence an object of the invention's below presented teachings is to provide an improved color encoding space, improved encoder realizations, and improved decoder realizations which handle such problematic aspects and lead to a more suitable video encoding system which is capable of handling HDR content (whereby we do not mean to say those embodiments are not very suitable for encoding LDR content as well).