Video signals may be characterized by multiple parameters, such as bit-depth, color space, color gamut, and resolution. Modern televisions and video playback devices (e.g., Blu-ray players) support a variety of resolutions, including standard-definition (e.g., 720×480i) and high-definition (HD) (e.g., 1920×1080p). Ultra high-definition (UHD) is a next generation resolution format with at least a 3,840×2,160 resolution (referred to as 4K UHD) and options to go as high as 7680×4320 (referred to as 8K UHD). Ultra high-definition may also be referred to as Ultra HD, UHDTV, or super high-vision. As used herein, UHD denotes any resolution higher than HD resolution.
Another aspect of a video signal's characteristic is it dynamic range. Dynamic range (DR) is a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest darks to brightest brights. As used herein, the term ‘dynamic range’ (DR) may relate to a capability of the human psychovisual system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest darks to brightest brights. In this sense, DR relates to a ‘scene-referred’ intensity. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g. interchangeably.
As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans the some 14-15 orders of magnitude of the human visual system (HVS). For example, well adapted humans with essentially normal vision (e.g., in one or more of a statistical, biometric or ophthalmological sense) have an intensity range that spans about 15 orders of magnitude. Adapted humans may perceive dim light sources of as few as a mere handful of photons. Yet, these same humans may perceive the near painfully brilliant intensity of the noonday sun in desert, sea or snow (or even glance into the sun, however briefly to prevent damage). This span though is available to ‘adapted’ humans, e.g., those whose HVS has a time period in which to reset and adjust.
In contrast, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms ‘enhanced or extended dynamic range’ (EDR), ‘visual dynamic range,’ or ‘variable dynamic range’ (VDR) may individually or interchangeably relate to the DR that is simultaneously perceivable by a HVS. As used herein, EDR may relate to a DR that spans 5-6 orders of magnitude. Thus while perhaps somewhat narrower in relation to true scene referred HDR, EDR nonetheless represents a wide DR breadth. As used herein, the term ‘simultaneous dynamic range’ may relate to EDR.
In practice, images comprise one or more color components (e.g., luma Y and chroma Cb and Cr) wherein each color component is represented by a precision of n-bits per pixel (e.g., n=8). Using linear luminance coding, images where n<8 (e.g., color 24-bit JPEG images) are considered images of standard dynamic range, while images where n>8 may be considered images of enhanced dynamic range. EDR and HDR images may also be stored and distributed using low bit-depth, non-linear luminance coding (e.g., 10-bits and logarithmic luminance coding), or high-precision (e.g., 16-bit) floating-point formats, such as the OpenEXR file format developed by Industrial Light and Magic.
To support backwards compatibility with legacy playback devices as well as new HDR or UHD display technologies, multiple layers may be used to deliver UHD and HDR (or EDR) video data from an upstream device to downstream devices. Given such a multi-layer stream, legacy decoders may use the base layer to reconstruct an HD SDR version of the content. Advanced decoders may use both the base layer and the enhancement layers to reconstruct an UHD EDR version of the content to render it on more capable displays. Such coding system may require updating coding parameters at multiple coding intervals, such as a coded region, frame, scene, or a group of scenes. As used herein, the terms “scene” or “shot” of a video sequence may relate to a series or a group of consecutive frames in the video signal sharing similar characteristics (e.g., colors, dynamic range, and the like.) As appreciated by the inventors here, improved techniques for scene-change or scene-cut detection in video are desirable.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.