Graphical overlays are common additions to modern television production. Live broadcast, covering news or sporting events for instance, is often augmented by graphical overlays, referred to here as enhancements. When covering a sporting event, game enhancements may contain information related to the action, background data, or statistics. For example, the game-clock and game-scores usually appear in the dash-board graphics. Other graphical representations may contain information regarding the players' or the teams' current (or past) performance. Using overlays to complement audio commentary as well as to provide further insight into the game is an integral part of today's live production and is also an important outlet for delivering event related analytics, promoting upcoming programming and the broadcast company brand, and presenting sponsorships.
As the infrastructure for delivering broadband multimedia content to consumers becomes more feasible and efficient, production of content captured by ultra-high-definition (UHD) or high-dynamic-range (HDR) cameras will be increasing. Displays that serve UHD/HDR content are becoming more affordable and new codecs such as HEVC (H.265) and VP9 already enable the streaming of 4K video. Moreover, high-rate cameras are used nowadays for slow motion play-backs and are likely to replace standard cameras in future high-scale sports productions. This advanced capturing technology may produce oversampling in the spatial, tone, or temporal domains. Though, this oversampling is intentional and designed to improve visualization of the action, it is redundant in areas of the video image that are relatively static or with low detail or contrast. Especially, high sampling of video regions where enhancements are rendered may not have any added value. This possible redundancy in overlays' representation may be used to embed additional data.
As large TV displays become mainstream outlets at end-consumer homes, television show producers will have more opportunities to augment live programming, since there is more room available on the displayed video image to insert overlay-graphics into. FIG. 1 shows exemplary representation of a broadcast program 105, including the cut-program 110 (typically live or play-back video of the covered event) and enhancements such as a sponsorship logo 115, a dash-board 120, a run-down 125, and a bottom-line 130. For example, the dash-board 120 overlay may contain real-time information including game-scores and shot-clock of a basketball game. These overlay-graphics are generated based on pre-determined overlay-templates that define the structure (e.g. size, shape, font-type), appearance (e.g. color and texture), animation rules (e.g. transition effects) of the overlay-graphics as well as their insertion time and location within the video frame.
A challenge in live program production is the need to enhance and re-cut the video feeds at multiple locations. Typically, the video is transmitted from the remote site (upstream) all the way to the end user's display (downstream) via multiple production centers, such as the on-site production-truck, the studio at the broadcast company site, and various local distribution sites. At each production site the video received may be further enhanced and may be combined with (cut into) other video feeds. Much of the processing that a video undergoes during production is a function of metadata associated with a certain event captured in the video. For example, metadata may be a location of a certain object at a certain time during the covered event or the instantaneous pose of the camera when capturing a certain video frame. Having these metadata in synchronization with their corresponding video frames is instrumental for triggering or generating enhancements that relate to real-time events or to inserting enhancements into the video in a way that is consistent with a camera's perspective, for example.
Known in the art methods for delivering video frames with synchronized metadata includes storing the metadata in areas of the video frame that are not part of the displayed video image (frame), such as in the header or within the ancillary data region of the video bitstream. The latter refers to an area in the video stream that is not part of the displayed video image, such as the Vertical Blanking Interval (VBI) that is traditionally used to store closed-caption data. Storing metadata in these regions, though, may not be a proper solution for applications that critically rely on it, as downstream manipulation of the video by a third party may override ancillary data. In fact, devices that compress or trans-code a video stream often strip out information external to the displayed image region as they reformat the video stream. Another option is to store the metadata in a separate data stream and transmit it in another channel (such as cellular communication link) in parallel to the video (that may be sent via satellite communication link). A drawback of this approach is the need for additional steps to manage book-keeping and synchronization.
An alternative solution to delivering video frames in synchronization with their corresponding metadata is employing digital watermarking methods. In recent years there has been an increasing interest in the field of digital watermarking. One enabler to this development is the ubiquitousness of digital content and the availability of tools and computer power for capturing, manipulating, transmitting, and viewing digital content. This trend has required methods of identifying and protecting the authorized source, distributer, or user of multimedia assets. Hence, digital watermarking is widely used for data protection and authentication, as well as other applications such as broadcast monitoring and covert communication.
Watermarking methods have also been proposed for the application of hiding information (metadata) within a host signal. A watermark, namely the hidden information, is inserted into the host signal so that the distortion induced is not perceptible. Watermarking video frames, then, may be a vehicle for delivering video frames with their corresponding metadata. Since the metadata is already embedded into the video frame it corresponds to, no additional steps of synchronization are required.
A top level description of a watermarking system is shown in FIG. 2. Therein, a digital signal 210, such as an audio, an image, or a video signal, may be used as a carrier (host) for a watermark (metadata) signal 220, imperceptibly embedded into it by a process employed by a watermark embedder 230. The watermarked host signal 240 is then delivered via a communication channel 250 to a watermark detector 270. Note that the received watermarked host signal 260 is a distorted version of the watermarked host signal 240, as the communication channel 250 may represent further processing of the watermarked signal (namely “attacks” such as compression or additive channel noise). The watermark detector 270 extracts the watermark from the received watermarked signal 260, outputting the extracted watermark signal 280. Ample watermarking techniques for embedding and extracting hidden information (watermarks) are known in the art. Various approaches are differentiated by characteristics such as imperceptibility (invisibility), robustness (invariability to attacks), and payload (capacity), as will be discussed further below.
In most applications imperceptibility of the watermark signal is an important requirement, especially when embedded into a broadcast video where degradation of quality is unacceptable. Yet, generally, there is a tradeoff between imperceptibility of a watermarking technique and its robustness and payload. A watermarking method is robust when the watermarked signal 240 can survive “attacks” introduced by further processing, either after embedding 230 (e.g. scaling, cropping, filtering, compressing), during transmission 250 (e.g. additive noise), or before detection 270 (e.g. decompressing, scaling, cropping, filtering). One way to increase robustness is to introduce redundancy, for example, by embedding each of the watermark's bits multiple times in various locations in the host signal. This approach limits the capacity (payload) of the watermarking method. Hence, payload is the number of bits a watermark encodes within a time unit. More specifically, when referring to a video, payload is the number of embedded bits per a frame. The larger the host signal, the higher its potential capacity. For instance, the payload of an HD video signal is expected to be higher than the payload of an SD video signal for the same level of robustness and imperceptibility.
A watermark may be embedded into the host signal in the spatial domain, the transform domain, or a combination thereof. First, the watermark (metadata) is converted into a bitstream. Then, insertion of the watermark bitstream may be done by substitution. For example, when using the host signal spatial domain, the least significant bit of a pixel value may be replaced by a watermark bit (“1” or “0”). To make sure that the watermarked host signal is perceptually identical to the original host signal, the components in the spatial domain that are modified by the watermark bitstream should be perceptually least significant. Another example is when using the transform domain, such as Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Counterlet Transform (CT), or Singular Value Decomposition (SVD) to embed the watermark bitstream. Thus, one or more transform coefficients may be used to embed the watermark bits. In the transform domain most of the image energy is distributed across the low-medium frequency coefficients. Therefore, modifications made by substituting the watermark bits into these coefficients are distributed across the entire image and, therefore, changes to the watermarked image due to further processing (e.g. compression) will change the watermarked coefficient only slightly.
Additive watermarking is another approach for embedding watermark bitstream into the host signal. For example, when using the host signal spatial domain to embed a watermark bit into a pixel in an image, a certain value may be added if the bit is “1”, otherwise, no addition is performed. The larger the added value is, the more robust is the watermarking method and the less imperceptible it is. To improve the imperceptibility, though, one may divide the added value among a group of pixels (e.g. an 8×8 block). In the transform domain, multiplicative watermarking may be used where the significant coefficients are multiplied by a certain value if an embedded watermark bit is “1”. Notice that, in both approaches, the original image is needed at the detector to extract the watermark bitstream. Watermarking techniques that require the original host signal for detection are called “informed” (non-blind) methods.
Hence, watermarking techniques are classified into blind and non-blind techniques. A blind technique is one where the embedder or the detector does not make use of information related to the original host signal to embed or detect the watermark, respectively. On the other hand a non-blind, or informed, technique utilizes knowledge of the original host signal when embedding or detecting the watermark. Generally, informed detectors are more robust than blind detectors as the availability of the original host signal at the detector's input is instrumental in extracting the watermark, thereby improving the detector's performance significantly. In most applications though, the original host signal is not known at the detector side, in which case blind techniques are used. Embodiments of this invention propose embedding metadata into the broadcast video in conjunction with overlay-graphics. Also disclosed are embodiments that utilize overlay-templates to watermark metadata employing informed watermarking techniques.