Video content is typically prepared and distributed by way of a combination of systems and technologies that may be termed a “video delivery pipeline”. FIG. 1 is a flowchart of a conventional video delivery pipeline 100 showing various stages from video capture to video content display. A sequence of video frames 101 is captured at block 102. Video frames 101 may be digitally captured (e.g. by a digital camera) or generated by a computer (e.g. using computer animation) to provide video data 103. Alternately, video frames 101 may be captured on film by a film camera. The film is converted to a digital format to provide video data 103. In a production phase 104 video data 103 is edited to provide a video production 105.
Video data of production 105 is provided to a processor at block 106 for post-production editing. Block 106 post-production editing may include adjusting or modifying colors or brightness in particular areas of an image to enhance the image quality or achieve a particular appearance for the image in accordance with the video creator's creative intent. This is sometimes called “color timing”. Other editing (e.g. scene selection and sequencing, image cropping, addition of computer-generated visual special effects, etc.) may be performed at block 106 to yield a final version 107 of the production for distribution. During block 106 post-production editing, video images are viewed on a reference display 108. In block 106 the final production 107 is viewed on reference display 108 or another reference display for approval. It is not mandatory that the same display be used for color timing and approval.
Following post-production, video data of final production 107 is delivered at block 116 to a display subsystem 120. As seen in FIG. 1A, block 116 delivery includes an encoder stage 117A which generates encoded video data 118 embodying the content of video data 107 to be distributed by way of a video distribution medium 115 (e.g. satellite, cable, DVD, wireless communication link, internet, local area network, broadcast, etc.). A decoder stage 117B is located downstream from encoder stage 117A to decode video data 118 transmitted over medium 115.
Display subsystem 120 may perform video processing 120A and displaying 120B. Video processor 120A may be integrated with displaying 120B or may be separate. At block 120A, video data 118 is provided to a video processor for processing and/or decoding. Video data 118 is output to a display 122 at block 120B to display a sequence of images to a viewer.
Encoded video data 118 may have a format selected with reference to properties of medium 115 (for example to fit within bandwidth requirements and/or format requirements of medium 115). To improve the quality of displayed images, encoded video data 118 may be driven through video delivery pipeline 100 at a relatively high bit rate so as to facilitate an increased bit depth for defining RGB or chroma values for each chrominance (color) channel. For example, video data 118 may comprise 8, 10 or 12 bits of data for each color channel of a pixel. The video data may be compressed.
Despite using a high bit depth for each chrominance channel, variations in display characteristics (such as luminance range, gamut, etc.) may affect the appearance of an image rendered on a display so that the image rendered does not match the creative intent of the video's creator. In particular, the perceived color or brightness of an image rendered on a particular display subsystem may differ from the color or brightness of the image as viewed on reference display 108 during post-production block 106.
The same video content 107 may be displayed on any of a wide variety of different types of electronic displays including televisions, computer displays, special purpose displays such as medical imaging displays or virtual reality displays, video game displays, advertising displays, displays on cellular telephones, tablets, media player displays, displays in hand-held devices, displays used on control panels for equipment of different kinds and the like. Displays may employ any of a wide range of technologies. Some non-limiting examples are plasma displays, liquid crystal displays (LCDs), cathode ray tube (CRT) displays, organic light emitting diode (OLED) displays, projection displays that use any of various light sources in combination with various spatial light modulation technologies, and so on.
Different displays may vary significantly with respect to features such as:                the color gamut that can be reproduced by the display;        the maximum brightness achievable;        contrast ratio;        resolution;        acceptable input signal formats;        color depth;        white level;        black level;        white point;        and grey steps.Consequently, the same image content may appear different when played back on different displays. Image content that matches a creator's creative intent when displayed on some displays may depart from the creator's creative intent in one or more ways when viewed on other displays. The appearance of displayed images is also affected by the environment in which a display is being viewed. For example, the luminance of ambient lighting, the color of ambient lighting and screen reflections can all affect the appearance of displayed images.        
With the increasing availability of high-performance displays (e.g. displays that have high peak luminance and/or broad color gamut) comes the problem of how to adjust images for optimum viewing on a particular display or type of displays. Addressing this problem in simplistic ways can result in noticeable artifacts in displayed images. For example, consider the case where an image that appears properly on a display having a moderate peak luminance is displayed on a target display having a very high peak luminance. If one expands the luminance range of the image data to take advantage of the high peak luminance of the target display, the result may be poor due to objectionable artifacts that are rendered apparent by the range expansion. Artifacts may include, for example, one or more of banding, quantization artifacts, visible macroblock edges, objectionable film grain and the like. On the other hand, if the image is displayed on the target display without range expansion, no benefit is gained from the high peak luminance that the target display can achieve.
Video formats used in many current imaging systems (e.g. HDTV, UHDTV) are based on defining black and white levels with a power response. This makes it very difficult to ensure consistent mid-tones, contrast, and color when video content derived from the same video data 107 is viewed on different displays.
The system pipeline response can be represented by an end-to-end transfer function that compares an image on a reference display and the same image displayed on a target display. Makers of displays typically make assumptions about the response of the reference display (and environment) in which images were approved. Displays may process image data in various ways (e.g. applying a power function response, making color saturation adjustments, adjusting brightness adjustment, adjusting contrast etc.) to arrive at an image that the display maker thinks will be best appreciated by viewers.
For accurate reproduction of video images, video distribution systems of the type illustrated in FIGS. 1 and 1A typically require that the system response of the pipeline be tailored to match characteristics of the reference display on which the content was approved and/or color graded. Display makers can attempt to provide image processing that achieves a desired system response, for example by defining a response curve between a minimum and maximum luminance. The minimum and maximum luminance may be implied from assumed capabilities of a reference display. Displays that perform image processing based on wrong assumptions regarding the characteristics of a reference display on which the image data was approved or color graded may produce images that are not faithful to the image as approved by its creator. For example different interpretations of the same video signal can result in inconsistent mid-tones, and other characteristics of images displayed on different displays.
Another issue with video distribution systems of the type illustrated in FIGS. 1 and 1A is that changing to a different reference display or altering the configuration of the reference display or its viewing environment can require changes to the system response of the pipeline if one wishes to ensure that viewers have the highest quality viewing experience. One approach to dealing with this issue is to provide metadata along with video data 118. The metadata can specify the source gamut. The target displays may then be made to adapt themselves by providing image processing selected based on the metadata to achieve the desired system response. This adds complexity and is prone to failure for some distribution channels (e.g. broadcast).
There is a general desire for systems, apparatus and methods for generating, delivering, processing and displaying video data to preserve the content creator's creative intent. There is a general desire for systems, apparatus and methods for providing information which may be used to guide downstream processing and/or display of video data.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.