Classical image/video technology (starting with analogue systems like NTSC and PAL, and continuing with digital video codecs like MPEG 1, MPEG 2, MPEG 4 etc.), utilizes what we now call Low Dynamic Range (LDR) or Standard Dynamic Range (SDR) coding. It is now widely recognized that for bringing a more immersive experience to the consumer, the next step in video technology needs to be the enhancement of dynamic range and peak brightness of the video signal. Research and development has begun from a number of single or groups of companies to come to the next generation of video codec, which is capable of handling so called High Dynamic Range (HDR) images of HDR scenes. These developments find their base in the notion that LDR signals cannot capture the dynamic range of real life scenes, cannot represent the dynamic range of what the human visual system can see, and therefore cannot transfer the full emotional experience in a scene to the consumer. Often, HDR is seen as a necessary feature for Ultra High Definition (UHD) television, i.e. with a display resolution of 3840×2160 pixels (“4k”), but HDR is also seen as a convincing feature on its own, e.g. in combination with HD resolution video.
Capturing HDR images requires a camera which can capture the increased dynamic range of at least above 11 stops, but preferably above 16 stops. Current cameras of e.g. ARRI, RED and Sony are achieving about 14 stops. Some HDR cameras use a slow and fast exposure and combine those in a single HDR image, other cameras use beam splitting towards two or more sensors of different sensitivity.
Whereas in classical imaging a lot of information was thrown away (hard clipped, e.g. the outside view from a room or car), present imaging systems can capture all that information, and the question is what to do with it then. Further, higher dynamic range displays are emerging, which have higher peak brightness than the currently typical 350 nit (or 100 nit for grading reference monitors). Televisions with a peak brightness of around 1000 nits are now entering the consumer market, and SIM2 has a 5000 nit professional monitor in their portfolio.
Display units are currently being developed that are able to provide a high brightness level and a very high contrast between dark parts of the image and bright parts of the image. For fully exploiting the capabilities of such displays, video information may be enhanced by providing adapted video information, e.g. taking into account the higher brightness and the HDR contrast range. For distinguishing from HDR, the traditional video information is called low dynamic range [LDR] video in this document. As such, LDR video information may be displayed on an HDR display unit in HDR display mode for improved contrast. However, a more compelling image is achieved when the video information itself is generated in an HDR video format, e.g. exploiting the enhanced dynamic range for better visual effects or for improving visibility of textures in bright or dark areas while avoiding visual banding. In addition to enhancing the precision of the image data, movie directors can locally enhance the experience, by e.g. emphasizing explosions, and/or improve visibility in bright or dark scenes/areas.
Standard developing organizations are rethinking the various video format parameters that determine the picture quality. Among these is the dynamic range. The dynamic range becomes more important with increasing peak brightness of the display. While most video content is still graded for 100 nits (cd/m2) displays, the brightness of modern commercial displays is usually already much higher (typically around 350 nits, but going up to say 600-800 nits). Professional displays with a brightness of around 4000 nits are already available. These displays are able to provide a much more life-like viewing experience.
In short HDR images are becoming more and more important. An HDR image may be an image which encodes the textures of an HDR scene (which may typically contain both very bright and dark regions) with sufficient information for high quality encoding of the color textures of the various captured objects in the scene, such that a visually good quality rendering of the HDR scene can be done on an HDR display with high peak brightness, like e.g. 5000 nit. A typical HDR image comprises brightly colored parts or parts strongly illuminated compared to the average illumination. Especially for night scenes HDR becomes more and more important.
In contrast with day scenes in which sun and sky illuminate each point the same, at night there may be only some light sources, which light the scene in a quadratically reducing manner. This creates bright regions around a light source, and dark regions in faraway corners. Some parts get almost no light from anywhere, making it very dark. I.e. in a night scene there may at the same time be parts having region luminances (or when captured by a linear camera pixel luminances) of above 10,000 nit for the lamps themselves, and fractions of a nit, e.g. 0.001 nit for the dark regions, making the total dynamic range 10 million to 1. This being the theoretical range for the brightest versus darkest pixel, the useful dynamic range may of course be lower, since one may not need to accurately represent a couple of small lamps or a small dark patch, but in typical HDR scenes even the useful dynamic range of the normal objects of interest may be well above 10,000:1 (or 14 stops). Mapping this to a display of 2000 nit peak brightness means that it should “theoretically” (assuming the relative to peak white rendering is sufficient for visual quality of the scene rendering) have a minimum (visible) black of for instance 0.2 nit.
HDR video (or even still image) encoding has only recently been researched. The typical belief is that one either needs to go towards significantly more bits, for encoding the brightnesses above the LDR range of scene objects (e.g. encodings which encode scene luminances directly), or one needs some two-layer approach, wherein e.g. in addition to an object reflectance image there is an illumination boost image, or similar decomposition strategies. A two layer HDR encoding approach has been published for instance in U.S. Pat. No. 8,248,486B1 and WO2005/1040035.
A simpler single image approach is disclosed in WO2011/107905 and WO2012/153224. This approach is based upon parametric encoding. In addition to simply encoding a single HDR image suitable for displays with a peak brightness at a reference value, e.g. 1500 nit, this approach also addresses displays with other peak brightnesses and dynamic ranges out there. Since there will also be displays of e.g. 500 or 100 nit, rather than to leave it blindly to the receiving side how to change the encoded high dynamic range image to some reasonable image by auto-conversion, color processing functions are co-encoded how to arrive at an appropriate image for the specific properties of the display, starting from the encoded HDR image. This process then results into an image optimized for that specific display, that a content creator could agree with.
With “high dynamic range” (HDR) images, we typically mean images as captured from the capturing side that have 1) a high luminance contrast ratio compared to legacy LDR encoding (i.e. contrast ratios of 10.000:1 or more); and 2) object luminances no less than 500, 700 or typically 1000 nits. An HDR coding system needs then to be capable of encoding this wide contrast ratio and high object luminances. An HDR reproduction system will typically reproduce highlights above 1000 nit to generate some desired appearance of say a lit lamp or sunny exterior.
The HDR image is to be displayed on a display. As already is the case with current commercial displays, future HDR displays will have different peak brightness levels depending on technology, design choices, cost considerations, market factors, etc. The video signal received by the display will usually be graded for a specific reference display, which may not correspond to the characteristic of the display on which the video signal is to be presented. The display receiving the HDR signal tries to adapt the video signal to match its own characteristics, including peak brightness level. If the receiver/display has no knowledge about the characteristics of the video signal and/or the grading that was applied, the resulting picture might not be in line with the artistic intent or might simply look bad. Therefore, dynamic range adaptation parameters/instructions may be and preferably are included with the video or conveyed otherwise to the display to provide processing information for optimizing the picture quality for the peak brightness level and other characteristics of the display on which the signal is displayed. The adaptation parameters may operate on the whole picture area or may be constrained to certain areas of the picture.
Alternatively, the HDR display may on its own adapt the incoming HDR signal for instance if it knows the characteristics for the incoming signal, for instance if a standard has been used.
By whatever method, the display therefore adapts the incoming e.g. HDR signal. For simplicity HDR signal is mentioned here, the incoming signal could also be an LDR signal which is then displayed in HDR mode (note that this LDR signal may, although it is by itself suitable for direct display on an LDR display, implicitly be an encoding of an HDR image look, because it contains all necessary pixel color data which can be functionally mapped to a HDR image by co-encoded functions). More specifically the display performs a dynamic range adaptation on the incoming signal for adjusting it to its characteristics (e.g. peak intensity, black level) before displaying it.
The display applies a mapping function which maps the incoming HDR (or LDR) data on a set of HDR data which best (or at least better or at least such is the intention) fits the capabilities of the display, such as e.g. black level and peak brightness level of the display. The so adapted HDR data is used for displaying the image on the display.
The mapping can be an upgrading of the image wherein the dynamic range of the displayed image is larger than the dynamic range of the original image as well as a downgrading of the image wherein the dynamic range is smaller than the dynamic range of the original image.
The effect of the dynamic range adaptation (below this will also sometimes be called “boost”, although when downgrading is performed the image is diminished rather than increased in dynamic range) is often most noticeable for very bright objects.
The video signal may be provided to the home in various ways, including through broadcast, through the internet or via packaged media. It may for instance be received by a set top box (STB) or via another video processing system as a compressed stream.
The set top box decodes the video and subsequently sends it as baseband video to the television set. In another example the coded video is stored on a storage medium, e.g. a DVD/Blu-ray disc or a Flash drive. In that case the playback device (media player MP) reads the content from the medium, decodes the compressed video and sends it to the television set. In both cases the separate box (VPS, video processing system) is connected with the TV through a standard interface (e.g. HDMI, Display Port, or a wireless video interface).
Typically set top boxes (STB) and media players (MP) do not simply pass the decoded video, but at some times merge the video with one or more graphics layers. For example in the case of Blu-ray Disc (BD) there are often 2 overlay layers: Presentation Graphics (PG) for subtitles and the graphics plane from the java engine (BD-J), e.g. for menu overlays. On top of those graphics planes there can be an additional plane for the user interface of the player.
While extending the existing video systems for HDR the high contrast ratio available in advanced display devices is used to achieve vivid and realistic video images. However, it has been found that, when overlaying graphics in such a HDR display mode, several problems may occur. For example, a problem that can occur with (semi-transparent) graphics overlays on top of HDR video is that some scenes in HDR video can be exceptionally bright. This will significantly reduce the legibility of the graphics such as subtitles or menus shown at the same time. Another problem that can occur is that the characters in the subtitles might become so bright that it becomes annoying or fatiguing for the reader. Also extreme bright subtitles or menus may cause halo effects or glare and thus degrade the perceived quality of the video.
Problems may occur both when the dynamic range is increased (from an LDR or low HDR to a higher HDR range) as well as when it is decreased (from a HDR to a lower dynamic range HDR or LDR). The dynamic range adaptation may be on the basis of parameters that are sent along with the video, based upon an analysis of the image in the TV, based upon information sent along with the video signal, or any other method. The dynamic range adaptation applies to the underlying video, not for the areas that contain the graphics overlays. The dynamic range adaptation may change at certain instances (e.g. when the scene is changing), while subtitles or a menu overlay may be fixed during the change. This may e.g. result in unwanted changes in the appearance of the graphics at scene boundaries.
In US0240125696 a solution has been described wherein the overlays are adjusted in dependence on the display mode. Prior to merging (or while merging) an overlay with a video signal (which could be an LDR or an HDR signal) the overlay is adapted (or the merging is adapted) in dependence on the display mode.
However, this requires an input for the display mode and for HDR processing instructions. Furthermore, displays are different and all have there own characteristics. Therefore the same adaptation of an overlay for the same display mode may not give the same result on different displays. This would require knowledge of the display characteristics.
Hence, an improved approach for adapting video would be advantageous and in particular an approach allowing increased flexibility, improved dynamic range adaptation, improved perceived image quality, improved overlay and/or video image presentation (in particular when changing dynamic range) and/or improved performance would be advantageous