1. Field of the Invention
The invention pertains to methods and systems for equalizing segments of video of different types (e.g., high dynamic range (HDR) video and standard dynamic range (SDR) video) such that a sequence of images determined by a sequence of the equalized segments has dynamic range (and optionally also at least one other characteristic, e.g., at least one of color gamut, and white point) that is at least substantially constant. Examples of systems configured to perform the equalization are video sources (e.g., broadcast installations) and display systems.
2. Background of the Invention
Throughout this disclosure including in the claims, the expression performing an operation “on” signals or data (e.g., filtering, scaling, or transforming the signals or data) is used in a broad sense to denote performing the operation directly on the signals or data, or on processed versions of the signals or data (e.g., on versions of the signals that have undergone preliminary filtering prior to performance of the operation thereon).
Throughout this disclosure including in the claims, the noun “display” and the expression “display system” are used as synonyms.
Throughout this disclosure including in the claims, the term “segment” of video (e.g., “segment” of a video program) denotes video data or a video signal indicative of at least one frame (typically, a sequence of consecutive frames). A display system can display each frame as an image, and a sequence of the frames as a sequence of images, each said image having a dynamic range (a range of displayed pixel intensities).
Throughout this disclosure including in the claims, the expression that an image is “determined by” video is used in a broad sense (which contemplates that the video may be an equalized or otherwise filtered version of input video) to denote both an image determined by a frame of the video and an image determined by a corresponding frame of the input video.
Throughout this disclosure, the expression “encoding” of video (e.g., a video channel) denotes mapping a sequence of samples of the video to a set of values (“code values”) indicative of displayed intensities in a range from a minimum intensity (black level) to a maximum intensity, where each of the code values determines a displayed intensity of a pixel (or a color component of a pixel, or a luminance or chroma value of a pixel, or another pixel component) when the encoded video is displayed. For example, a video channel may be encoded in a linear manner (so that the code values of the encoded video channel are linearly related to displayed intensity values) or a nonlinear manner (so that the code values of the encoded video channel are nonlinearly related to displayed intensity values).
Throughout this disclosure including in the claims, the expression “encoded video” denotes video determined by one or more channels of code values, each of the channels comprising a sequence of code values. For example, conventional Rec. 709 RGB video is encoded video comprising three channels of code values: a red channel comprising a sequence of red (R) code values (red color component values), a green channel comprising a sequence of green (G) code values (green color component values), and a blue channel comprising a sequence of blue (B) code values (blue color component values). For another example, conventional YCrCb video is encoded video comprising three channels of code values: a Y channel comprising a sequence of luminance or luma code values (e.g., luminance code values (Y), each of which is a weighted sum of linear R, G, and B color components, or luma code values (Y), each of which is a weighted sum of gamma-compressed R′, G′, and B′ color components), a Cr channel comprising a sequence of Cr (chroma) code values, and a Cb channel comprising a sequence of Cb (chroma) code values.
Throughout this disclosure, the expression “peak white level” (or white point) denotes the smallest code value (of a channel of encoded video) indicative of a pixel or pixel component (e.g., a color component of a pixel, or a luminance or chroma value of a pixel) having maximum displayed intensity when the encoded video is displayed (assuming that the displayed pixels are determined by code values of the channel that include the entire range of code values available for said channel, and code values of any other channel that determine the displayed pixels are identical for all the displayed pixels). To display the encoded video channel, a video system may map to the maximum displayed intensity (e.g., clip or compress to the maximum displayed intensity) any code values of the channel that are larger than the peak white level.
Throughout this disclosure, the expression “black level” denotes the largest code value (of a channel of encoded video) indicative of a pixel or pixel component (e.g., a color component of a pixel, or a luminance or chroma value of a pixel) having minimum displayed intensity when the encoded video is displayed (assuming that the displayed pixels are determined by code values of the channel that include the entire range of code values available for said channel, and code values of any other channel that determine the displayed pixels are identical for all the displayed pixels). To display the encoded video channel, a video system may map (e.g., clip or compress), to the minimum displayed intensity, any code values of the channel that are smaller than the black level.
Throughout this disclosure, the expression “standard dynamic range” or “SDR” (or “low dynamic range” or “LDR”) channel denotes a channel of encoded video (e.g., a channel of a video signal indicative of encoded video data) having bit depth equal to N (e.g., N=8, or 10, or 12), where the code values available for the channel are in a range from a black level, X (referred to herein as a “standard black level”), to a peak white level, Z (referred to herein as a “standard white level”), where 0≦X≦Z≦2N−1. It should be appreciated that the dynamic range of the content transmitted by a channel is often of greater importance than the dynamic range of the channel, and that either encoded video having a first dynamic range (sometimes referred to as “low dynamic range video” or “standard dynamic range video” or “SDR video”) or encoded video having a dynamic range that is greater than the first dynamic range (sometimes referred to as “high dynamic range video” or “HDR video” with reference to the low dynamic range video) could be transmitted by an SDR channel with the same bit precision but with different granularity.
Throughout this disclosure including in the claims, the expression “standard dynamic range” (or “SDR” or “low dynamic range” or “LDR”) video system denotes a system configured to display, in response to SDR video having at least one SDR channel, an image sequence (or image) whose luminance has a dynamic range (sometimes referred to herein as a standard dynamic range). Herein, the term “luminance” (of an image or image sequence) is used in a broad sense to denote luminance of the image or image sequence, or intensity (or brightness) of the achromatic portion of the image or image sequence, or intensity (or brightness) of the image or image sequence. It should be appreciated that the peak brightness of a physical display can change depending on its white point.
Throughout this disclosure, the expression “high dynamic range” (or “HDR”) channel, used with reference to an SDR channel (or SDR video whose channels are all SDR channels), denotes a channel of encoded video (e.g., a channel of a video signal indicative of encoded video data) having dynamic range greater than that of the SDR channel (or than that of each channel of the SDR video). For example, the HDR channel may have bit depth greater than N (where each SDR channel has bit depth equal to N) or the code values available for the HDR channel may be in a range from a minimum value, Min, to a maximum value, Max, where 0≦Min<X<Z<Max<2N−1, where X is a standard black level, and Z is a standard white level (and where the code values available for each SDR channel are in the range from X to Z.
An example of HDR video is “visual dynamic range” (VDR) video, which is video data (or a video signal) capable of being displayed by a display system with the full dynamic range perceivable by a human viewer under normal display viewing conditions. One type of VDR video is described in PCT International Application PCT/US2010/022700, by Dolby Laboratories Licensing Corporation (published as PCT International Application Publication No. WO 2010/104624 A2).
In one conventional SDR display system which operates with 8 bit YCbCr video signals, with the code value 235 considered the maximum level (so that the code values in the range from 236-254 are not used to display images), code value 16 (cast into absolute units for a reference display) represents about 0.01 cd/m2 (0.01 candelas per square meter, where the unit “candelas per square meter” is sometimes referred to as “nits) and code value 235 represents about 100 cd/m2. The maximum dynamic range of the SDR content of such a system is thus 0 through 100 nits. The maximum dynamic range of the SDR content of some other conventional SDR display systems is 0 through 500 nits. It should be appreciated that the present invention is applicable to encoded video of any bit depth, although some systems and methods are described with reference to encoded video of a specific bit depth (for clarity).
A video broadcast system may broadcast both SDR and HDR video content, for example, a video program comprising a sequence of HDR video segments (e.g., segments of a movie or TV program) time-division multiplexed with SDR video segments (e.g., commercials).
FIG. 1, which depicts a conventional video broadcast system, includes a simplified block diagram of national broadcaster installation 1 (e.g., NBC National) and a simplified block diagram of regional broadcaster installation 3 (e.g., Seattle NBC). Installation 1 (sometimes referred to herein as subsystem 1) is configured to broadcast a video output stream to regional installation 3 (sometimes referred to herein as subsystem 3) via delivery subsystem 2. Subsystem 2 may implement a standard (e.g., cable or satellite) transmission path. The video output stream may be stored by subsystem 2 (e.g., in the form of a DVD or Blu ray disc), or transmitted by subsystem 2 (which may implement a transmission link or network), or may be both stored and transmitted by subsystem 2.
In subsystem 1, switcher 5 is coupled to receive video input streams 4A, 4B, and 4C (which may be stored on suitable storage media). Input streams 4A, 4B, and 4C are typically of different types in the sense that at least one video characteristic of each (e.g., at least one of color gamut, dynamic range, and white point) differs from at least one characteristic of a least one other one of said input streams. Each of streams 4A, 4B, and 4C is an encoded video stream in the sense that it comprises a sequence of code words indicative of input video. Switcher 5 (sometimes referred to as “to the air” switcher 5) is configured to select which of the input streams is to be broadcast, and to time-division multiplex the selected content (or insertion spots or other markers for content) into the combined stream to be output to delivery subsystem 2. Switcher 5 typically can insert into the combined stream either insertion spots for commercials (downstream trigger points) or commercials themselves. Within digital modulator 7 of subsystem 1, the combined (time-division multiplexed) stream is compressed (e.g., via MPEG-2 encoding) and typically also scrambled, and modulated for delivery over a physical network. For simplicity, management software is not shown in FIG. 1, but installation 1 would typically employ such software to implement scheduling, tracking of commercials, and billing.
In demodulator 9 of regional broadcasting installation (subsystem) 3, a delivered signal received from subsystem 2 is demodulated, to recover an encoded video stream (e.g., an MPEG-2 encoded video stream of the type generated in modulator 7 of installation 1). In splicing subsystem 6, local commercials, live sports casts, and news shows (which are typically MPEG encoded by local encoder 8) are spliced (time-division multiplexed), as required, into the stream recovered in demodulator 9.
Throughout the delivery chain implemented by the FIG. 1 system, there are several sources of content that can be placed into distribution, including the sources coupled to the inputs of switcher 5 and sources coupled to the inputs of encoder 8 (or splicer 6). Video from each source can have a different dynamic range, gamut (color gamut), or even white point. Thus consumer who views a display generated in response to the broadcast output (i.e., a time-division multiplexed sequence of video segments from different ones of the sources) may notice undesirable fluctuations in brightness (and/or color gamut and/or color temperature and/or at least one other parameter) during transitions between the segments (e.g., during transitions between commercial and non-commercial content). This problem can be especially severe when the broadcast video is a sequence of HDR (e.g., VDR) video segments and SDR video segments.
Thus, implementation of a visual dynamic range (VDR) or other HDR video delivery pipeline will encounter obstacles due to the need to deliver video content from multiple sources through the same pipeline, where the content from each source is, in general, of a different type (e.g., has different dynamic range, color gamut, and/or white point).
For example, during capture of a sporting event, a mixture of HD and SD cameras could be employed to generate both HDR (e.g., VDR) and SDR video content to be delivered. If the content is left unprocessed, the image displayed (in response to the content delivered via the pipeline) could have large steps in luminance levels or different gamuts. For example, the bitstream sent down the pipeline might include SDR content captured with SD cameras which has half the brightness and wash out (smaller gamut and white point) than HDR studio content captured with HD cameras. Consider another example in which an HDR television show to be delivered via a pipeline has SDR commercial content inserted into the stream delivered over the pipeline. A consumer who views the pipeline output may notice fluctuations in brightness during transitions between commercial and non-commercial content.
Other video delivery pipelines may need to deliver video source content (e.g., commercials, TV shows, and movies) having different dynamic ranges, gamuts and/or white points, via broadcast, or OTT delivery (“over the top” delivery by internet-based technology) or VOD (video on demand) delivery. The inventor has recognized that, in this context, there is a need to be able to adjust all the content intelligently to ensure a consistent viewing experience. For example, there may be a need to adjust the display of delivered content at the terminating display when content switches between a VDR movie (or TV show) and commercials. The commercials may have been introduced into the delivered stream at the last moment and this introduction could result in a mixture of SDR and HDR formats within the streamed content. The overall brightness of the displayed content could have significant jumps and not be pleasing to the viewer. During video broadcast (or OTT video delivery or VOD delivery), commercial vendors may not wish to store both an HDR (e.g., VDR) and SDR version of a commercial to be inserted in the stream to be delivered.
Some embodiments of the invention are methods (implemented at any of at different stages in a video delivery pipeline) for equalizing dynamic range of video content (e.g., captured camera data), typically by implementing an automated video equalization algorithm. Video equalization in accordance with the invention can be implemented at various points in a delivery pipeline, e.g., at distribution (e.g., in a broadcast installation, or an OTT or VOD delivery installation), or in a display system which displays the delivered content. The inventor has recognized that managing video equalization at the display system can provide the benefit of ensuring a constant viewing experience while channel surfing, switching video sources, and proper adjustments to on-screen displays between the modes.
The inventor has also recognized that in order to preserve artistic intent, the inventive video equalization should be implemented with a common anchor point for the input video and equalized video, and so that the displayed image(s) determined by the equalized video have at least substantially the same average luminance as the displayed image(s) determined by the input video. In contrast, simple mapping of code values of input video (having one dynamic range) to code values of equalized video (having a different dynamic range) without a common anchor point would typically destroy the artistic intent of the input video's originator (e.g., it could cause the displayed images determined by the equalized video to have much different aesthetic characteristics than those determined by the input video).