This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
The dynamic range of luminance in a picture can be defined as a ratio between the highest luminance value of an image and the lowest luminance value of the image:r=bright/darkwhere “bright” denotes the highest luminance value of the image and “dark” denotes the lowest luminance value of the image. The dynamic range “r” is generally expressed as a number of power of two, called f-stops or equivalent stops. For instance, a ratio 1000 is about 10 f-stops, which is the typical dynamic range of standard non-HDR videos, also called SDR (Standard Dynamic Range) videos or equivalently LDR (Low Dynamic Range) videos.
The dynamic range of an image can be very high, and may be well beyond the range that can be represented by standard image formats, such as the 8-10 bits gammatized formats used in broadcasting or PC imaging. Here the term “gammatized formats” refer to image formats represented in a non-linear domain. For example, linear components, such as, but not restricted to, RGB and Y, are transformed into the gammatized domain by using a non-linear function that can be a power function, a logarithm or an OETF (Opto-Electronic Transfer Function) such as those defined in the ITU-R Recommendation BT.709/BT.2020.
Different images may be represented by different formats, and have different dynamic ranges. For instance, we consider an image I whose luminance is expressed linearly in nits by an element Y. The value of Y may correspond to the true luminosity of a captured scene in a so-called “scene reference” format (i.e., in the file format Y=1 corresponds to 1 nit of luminance in the captured scene, Y=x corresponds to x nits). The range of Y may cover all possible luminance ranges of image scenes captured by a camera, for instance. By varying the optics (filters, aperture) and/or sensors (exposure time, ISO) of a camera, the range of scenes can be very high. Very dark scenes like telescope observation (e.g., using long exposure time) or very bright scenes like sunsets (e.g., using very small aperture and strong filters) are both possible, leading to very dark and very bright pictures. Consequently, the dynamic range may be well over 15 f-stops.
The value of element Y may also be used to represent the luminosity provided by a display on which the image has been post-produced in a so-called “display reference” format (i.e., in the file format Y=1 corresponds to 1 nit of luminance rendered by a display used for the grading, Y=x corresponds to x nits). The dynamic range provided by the “display reference” format is usually much lower than that of a “scene reference” format. This results in a more limited dynamic range and peak luminance of the associated pictures. For example, these images may have a dynamic range of 15 f-stops and a peak luminance of 1000 nits, as those defined in some restricted broadcasting-oriented specification.
Often an image or video of a high dynamic range is called a High Dynamic Range (HDR) image or video. The exact dynamic range that an HDR video application supports may vary. For example, the SMPTE (Society of Motion Picture and Television Engineers) defines a Perceptual Quantizer EOTF (Electro-Optical Transfer Function) also known as PQ EOTF (defined in SMPTE ST. 2084) non-linear transfer curve, preferably coded on 12 bits, which may code the luminance on the range from 0.005 nits to 10000 nits (nit is a term referring to candela per square meter units or cd/m2, a unit for light intensity), leading to a ratio of 2 million or about 21 f-stops. Practically, first deployments of HDR at home may be expected to be TV sets providing not much more than a peak brightness of 1000 nits and a dynamic range of 15 f-stops, preferably on 10 bits data format if possible. This restricted HDR is also referred to as Extended Dynamic Range (EDR). Typically, an SDR video has a bit depth of 8 or 10 bits, and an HDR video has a bit depth of 10 bits and higher. For example, an SDR video can be a 4:2:0 Y′CbCr 10-bit video, and an HDR video can be a PQ OETF Y′CbCr 12-bit video.
In the present application, for ease of notation, we classify HDR videos into “EDR videos” and “strictly HDR” videos, where “EDR videos” refer to the videos with a dynamic range between 10 and 15 f-stops, and “strictly HDR” videos refer to those above 15 f-stops, as illustrated in TABLE 1.
TABLE 1Dynamic rangeSDRr ≤ 10 f-stopsHDREDR10 f-stops < r ≤ 15 f-stopsStrictly HDRr > 15 f-stops
In order for HDR images to be displayed on SDR devices such as TV sets or computer monitors, the images should be converted to become viewable (i.e., in a format compatible with the display device, and preserve the overall perceived brightness and colorfulness of the HDR videos) on the SDR devices. We denote by R the data range on which the luminance Y should be mapped, for instance R=[0,255] for an 8-bit SDR format or R=[0,1023] for a 10-bit SDR format with a standard EOTF defined by ITU-R BT.709 or BT.2020.
An “absolute” mapping functionπ:linear domain→R, which maps a value from a linear domain to a data range R, can be used for the conversion. Here “absolute” should be understood as that a mapped value corresponds to a unique input value, i.e., the mapping function is not adapted to the content. Such an “absolute” mapping, which maps luminance Y from a linear domain to the data range R does not always work well. For example, it may map very dark scenes uniformly to zero and very bright scenes to the upper bound (e.g., 255 or 1023) of the data range supported by the output device.