The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
In the following, a color picture contains several arrays of samples (pixel values) in a specific picture/video format which specifies all information relative to the pixel values of a picture (or a video) and all information which may be used by a display and/or any other device to visualize and/or decode a picture (or video) for example. A color picture comprises at least one component, in the shape of a first array of samples, usually a luma (or luminance) component, and at least one another component, in the shape of at least one other array of samples. Or, equivalently, the same information may also be represented by a set of arrays of color samples (color component), such as the traditional tri-chromatic RGB representation.
A pixel value is represented by a vector of C values, where c is the number of components. Each value of a vector is represented with a number of bits which defines a maximal dynamic range of the pixel values.
Standard-Dynamic-Range pictures (SDR pictures) are color pictures whose luminance values are represented with a limited dynamic usually measured in power of two or f-stops. SDR pictures have a dynamic around 10 fstops, i.e. a ratio 1000 between the brightest pixels and the darkest pixels in the linear domain, and are coded with a limited number of bits (most often 8 or 10 in HDTV (High Definition Television systems) and UHDTV (Ultra-High Definition Television systems) in a non-linear domain, for instance by using the ITU-R BT.709 OEFT (Optico-Electrical-Transfer-Function) (Rec. ITU-R BT.709-5, April 2002) or ITU-R BT.2020 OETF (Rec. ITU-R BT.2020-1, June 2014) to reduce the dynamic. This limited non-linear representation does not allow correct rendering of small signal variations, in particular in dark and bright luminance ranges. In High-Dynamic-Range pictures (HDR pictures), the signal dynamic is much higher (up to 20 f-stops, a ratio one million between the brightest pixels and the darkest pixels) and a new non-linear representation is needed in order to maintain a high accuracy of the signal over its entire range. In HDR pictures, raw data are usually represented in floating-point format (either 32-bit or 16-bit for each component, namely float or half-float), the most popular format being openEXR half-float format (16-bit per RGB component, i.e. 48 bits per pixel) or in integers with a long representation, typically at least 16 bits.
A color gamut is a certain complete set of colors. The most common usage refers to a set of colors which can be accurately represented in a given circumstance, such as within a given color space or by a certain output device.
A color gamut is sometimes defined by RGB primaries provided in the CIE1931 color space chromaticity diagram and a white point as illustrated in FIG. 1.
It is common to define primaries in the so-called CIE1931 color space chromaticity diagram. This is a two dimensional diagram (x,y) defining the colors independently on the luminance component. Any color XYZ is then projected in this diagram thanks to the transform:
  {                              x          =                      X                          X              +              Y              +              Z                                                                    y          =                      Y                          X              +              Y              +              Z                                          The z=1−x−y component is also defined but carry no extra information.
A gamut is defined in this diagram by the triangle whose vertices are the set of (x,y) coordinates of the three primaries RGB. The white point W is another given (x,y) point belonging to the triangle, usually close to the triangle center.
A color volume is defined by a color space and a dynamic range of the values represented in said color space.
For example, a color gamut is defined by a RGB ITU-R Recommendation BT.2020 color space for UHDTV. An older standard, ITU-R Recommendation BT.709, defines a smaller color gamut for HDTV. In SDR, the dynamic range is defined officially up to 100 nits (candela per square meter) for the color volume in which data are coded, although some display technologies may show brighter pixels.
As explained extensively in “A Review of RGB Color Spaces” by Danny Pascale, a change of gamut, i.e. a transform that maps the three primaries and the white point from a gamut to another, can be performed by using a 3×3 matrix in linear RGB color space. Also, a change of space from XYZ to RGB is performed by a 3×3 matrix. As a consequence, whatever RGB or XYZ are the color spaces, a change of gamut can be performed by a 3×3 matrix. For example, a gamut change from BT.2020 linear RGB to BT.709 XYZ can be performed by a 3×3 matrix.
High Dynamic Range pictures (HDR pictures) are color pictures whose luminance values are represented with a HDR dynamic that is higher than the dynamic of a SDR picture.
The HDR dynamic is not yet defined by a standard but one may expect a dynamic range up to a few thousands nits. For instance, a HDR color volume is defined by a RGB BT.2020 color space and the values represented in said RGB color space belong to a dynamic range from 0 to 4000 nits. Another example of HDR color volume is defined by a RGB BT.2020 color space and the values represented in said RGB color space belong to a dynamic range from 0 to 1000 nits.
Color-grading a picture (or a video) is a process of altering/enhancing the colors of the picture (or the video). Usually, color-grading a picture involves a change of the color volume (color space and/or dynamic range) or a change of the color gamut relative to this picture. Thus, two different color-graded versions of a same picture are versions of this picture whose values are represented in different color volumes (or color gamut) or versions of the picture whose at least one of their colors has been altered/enhanced according to different color grades. This may involve user interactions.
For example, in cinematographic production, a picture and a video are captured using tri-chromatic cameras into RGB color values composed of 3 components (Red, Green and Blue). The RGB color values depend on the tri-chromatic characteristics (color primaries) of the sensor. A first color-graded version of the captured picture is then obtained in order to get theatrical renders (using a specific theatrical grade). Typically, the values of the first color-graded version of the captured picture are represented according to a standardized YUV format such as BT.2020 which defines parameter values for UHDTV.
The YUV format is typically performed by applying a non-linear function, so called Optical Electronic Transfer Function (OETF) on the linear RGB components to obtain non-linear components R′G′B′, and then applying a color transform (usually a 3×3 matrix) on the obtained non-linear R′G′B′ components to obtain the three components YUV. The first component Y is a luminance component and the two components U,V are chrominance components.
Then, a Colorist, usually in conjunction with a Director of Photography, performs a control on the color values of the first color-graded version of the captured picture by fine-tuning/tweaking some color values in order to instill an artistic intent.
The problem to be solved is the distribution of a compressed HDR picture (or video) while, at the same time, distributing an associated SDR picture (or video) representative of a color-graded version of said HDR picture (or video).
A trivial solution is simulcasting both SDR and HDR picture (or video) on a distribution infrastructure but the drawback is to virtually double the needed bandwidth compared to a legacy infrastructure distributing adapted to broadcast SDR picture (or video) such as HEVC main 10 profile (“High Efficiency Video Coding”, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.265, Telecommunication Standardization Sector of ITU, April 2013).
Using a legacy distribution infrastructure is a requirement to accelerate the emergence of the distribution of HDR pictures (or video). Also, the bitrate shall be minimized while ensuring good quality of both SDR and HDR version of the picture (or video).
Moreover, backward compatibility may be ensured, i.e. the SDR picture (or video) shall be viewable for users equipped with legacy decoder and display, i.e. in particular, overall perceived brightness (i.e. dark vs. bright scenes) and perceived colors (for instance, preservation of hues, etc.) should be preserved.
Another straightforward solution is to reduce the dynamic range of the HDR picture (or video) by a suitable non-linear function, typically into a limited number of bits (say 10 bits), and directly compressed by the HEVC main10 profile. Such non-linear function (curve) already exist like the so-called PQ EOTF proposed by Dolby at SMPTE (SMPTE standard: High Dynamic Range Electro-Optical Transfer Function of Mastering Reference Displays, SMPTE ST 2084:2014).
The drawback of this solution is the lack of backward compatibility, i.e. the obtained reduced version of the picture (video) has not a sufficient visual quality to be considered as being viewable as a SDR picture (or video), and compression performance are somewhat poor.
The present disclosure has been devised with the foregoing in mind.