As used herein, the term ‘dynamic range’ (DR) may relate to a capability of the human visual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest blacks (darks) to brightest whites (highlights). In this sense, DR relates to a ‘scene-referred’ intensity. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g. interchangeably.
As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans the some 14-15 or more orders of magnitude of the human visual system (HVS). In practice, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms enhanced dynamic range (EDR) or visual dynamic range (VDR) may individually or interchangeably relate to the DR that is perceivable within a scene or image by a human visual system (HVS) that includes eye movements, allowing for some light adaptation changes across the scene or image. As used herein, EDR may relate to a DR that spans 5 to 6 orders of magnitude. Thus while perhaps somewhat narrower in relation to true scene referred HDR, EDR represents a wide DR breadth and may also be referred to as HDR.
In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr) wherein color components are represented by a precision of n-bits per pixel (e.g., n=8). Using linear luminance coding, images where n≤8 (e.g., color 24-bit JPEG images) are considered images of standard dynamic range, while images where n>8 may be considered images of enhanced dynamic range. EDR and HDR images may also be stored and distributed using high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.
A reference electro-optical transfer function (EOTF) for a given display characterizes the relationship between color values (e.g., luminance) of an input video signal to output screen color values (e.g., screen luminance) produced by the display. For example, ITU Rec. ITU-R BT. 1886, “Reference electro-optical transfer function for flat panel displays used in HDTV studio production,” (March 2011), which is incorporated herein by reference in its entirety, defines the reference EOTF for flat panel displays based on measured characteristics of the Cathode Ray Tube (CRT). Given a video stream, information about its EOTF is typically embedded in the bit stream as metadata. As used herein, the term “metadata” relates to any auxiliary information that is transmitted as part of the coded bitstream and assists a decoder to render a decoded image. Such metadata may include, but are not limited to, color space or gamut information, reference display parameters, and auxiliary signal parameters, as those described herein.
Most consumer desktop displays currently support luminance of 200 to 300 cd/m2 or nits. Most consumer HDTVs range from 300 to 500 nits with new models reaching 1000 nits (cd/m2). Such displays thus typify a lower dynamic range (LDR), also referred to as a standard dynamic range (SDR), in relation to HDR or EDR. As the availability of HDR content grows due to advances in capture equipment (e.g., cameras) and HDR displays (e.g., the PRM-4200 professional reference monitor from Dolby Laboratories), HDR content may be color graded and displayed on HDR displays that support higher dynamic ranges (e.g., from 1,000 nits to 5,000 nits or more). Such displays may be defined using alternative EOTFs that support high luminance capability (e.g., 0 to 10,000 nits). An example of such an EOTF is defined in SMPTE ST 2084:2014 “High Dynamic Range EOTF of Mastering Reference Displays,” which is incorporated herein by reference in its entirety. As appreciated by the inventors here, improved techniques for encoding and decoding reversible production-quality single-layer video signals that may be used to support a wide variety of display devices are needed.
As used herein, the term “forward reshaping” denotes the process of mapping (or quantizing) an HDR image from its original bit depth to an image of a lower or the same bit depth to allow compressing the image using existing coding standards or devices. In a receiver, after decompressing the reshaped signal, the receiver may apply an inverse reshaping function to restore the signal to its original high dynamic range. As appreciated by the inventors here, improved techniques for image reshaping of high dynamic range images are needed.
A forward reshaping look-up table (LUT) is a table in which the mapping or quantizing of the forward reshaping has been stored.
As used herein, the term backwards compatible denotes hardware and/or software that are designed to function with SDR, SDR with Dolby metadata and HDR interchangeably. If the compressed video bit stream is present, then SDR may be viewed. If SDR and Dolby metadata are contained within the compressed video stream then the video may be viewed in SDR or in HDR. The underlying bit stream may be encoded by any codec, such as AVC, HEVC, VP9, or any future codec.
The term real-time may refer to real-time architectures and/or real-time implementations. Real-time architectures are those in which the data for processing is made available at the time of processing, e.g. there is little to no dependency on data that will not be available at the current time instance, so that data dependency delays are minimized. Real-time implementations are those in which processing may be performed within a fixed time interval, e.g. the average processing time may be performed within a certain number of frames, e.g. optimized algorithms capable of quickly realizing a result. In this way a real-time architecture provides data that is temporally near the time of processing and real-time implementations utilize this temporally near data in algorithms that may be performed within a certain number of frames, i.e. processed quickly. The instant disclosure pertains to both aspects, it is understood that achieving an optimized real-time result may be optimally realized with a real-time implementation working in conjunction with a real-time architecture.
The term single layer denotes a compressed video bit stream. Two different bitstreams may be delivered, the first stream is a compressed video bitstream, such as AVC, HEVC, which contains the compressed pixels information, and is SDR. The bitstreams may be decoded by any legacy device. A second stream has Dolby metadata, which contains a backward reshaping function. With the first stream, video may be watched in SDR. If both the first stream and the second stream are present, the video may be watched in HDR. The first stream, the compressed video bitstream, does not contain the Dolby metadata.
The term central tendency as used herein is a measure used to describe at least one of the average, mean, median, mode, center of distribution, smallest absolute deviation, dispersion, range, variance, standard deviation with kurtosis and the like, e.g. it is a measure of where the middle of the dataset lies. The term linear non-linear combination may be used in referring to the central tendency measure.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.