Digital watermarking is type of signal processing in which auxiliary message signals are encoded in image, audio or video content in a manner that is imperceptible to humans when the content is rendered. It is used for a variety of applications, including, for example, broadcast monitoring, device control, asset management, audience measurement, forensic tracking, automatic content recognition, etc. In general, a watermarking system is comprised of an encoder (the embedder) and a compatible decoder (often referred to as a detector, reader or extractor). The encoder transforms a host audio-visual signal to embed an auxiliary signal, whereas the decoder transforms this audiovisual signal to extract the auxiliary signal. The primary technical challenges arise from design constraints posed by real world usage scenarios. These constraints include computational complexity, power consumption, survivability, granularity, retrievability, subjective quality, and data capacity per spatial or temporal unit of the host audio-visual signal.
Despite the level of sophistication that commercial watermarking technologies have attained, the increasing complexity of audio-visual content production and distribution, combined with more challenging use cases continue to present significant technical challenges. Distribution of content is increasingly “non-linear” meaning that audio-visual signals are distributed and then redistributed within the supply chain among intermediaries and consumers through myriad of different wired and wireless transmission channels and storage media, and consumed on a variety of rendering devices. In such an environment, audio and visual signals undergo various transformation that watermark signals must survive, including format conversions, transcoding with various compression codecs and bitrates, geometric and temporal distortions of various kinds, layering of watermark signals and mixing with other watermarked or un-watermarked content.
Encoding of watermarks at various points in the distribution path benefits from a scheme for orchestrating encoding of a watermarks to avoid collision with previously embedded watermark layers. Orchestrating encoding may be implemented, for example, by including a decoder as a pre-process within an encoder to detect a previously embedded watermark layer and execute a strategy to minimize collision with it. For more background, please see our U.S. Pat. Nos. 8,548,810 and 7,020,304, which are hereby incorporated by reference.
While such orchestration is effective in some cases, it is not always possible for a variety of reasons. As such, watermarks need to be designed to withstand overlaying of different watermarks. Additionally, they need to be designed to be layered or co-exist with other watermarks without exceeding limits on perceptual quality.
When multiple watermark layers are potentially present in content, it is more challenging to design encoders and decoders to achieve the above mentioned constraints Both encoding and decoding speed can suffer as encoding becomes more complex and presence of watermark layers may make reliable decoding more difficult. Relatedly, as computational complexity increases, so does power consumption, which is particularly problematic in battery powered devices. Data capacity can also suffer as there is less available channel bandwidth for watermark layers within the host audio-visual signal. Reliability can decrease as the presence of potentially conflicting signals may lead to increases in false positives or false negatives.
The challenges are further compounded in usage cases where there are stringent requirements for encoding and decoding speed. Both encoding and decoding speed is dictated by real time processing requirements or constraints defined in terms of desired responsiveness or interactivity of the system. For example, encoding often must be performed within time constraints established by other operations of the system, such as timing requirements for transmission of content. Time consumed for encoding must be within latency limits, such as frame rate of an audio-visual signal. Another example with stringent time constraints is encoding of live events, in which encoding is performed on an audio signal captured at a live event and then played to an audience. See U.S. Patent Application Publication 20150016661, which is hereby incorporated by reference. Another example is encoding and decoding within the time constraints of a live distribution stream, namely, as the stream is being delivered, including terrestrial broadcast, cable/satellite networks, IP (managed or open) networks, and mobile networks, or within re-distribution in consumer applications (e.g., AirPlay, WiDi, Chromecast, etc.).
The mixing of watermarks presents additional challenges in the encoder and decoder. One challenge is the ability to reliably and precisely detect a boundary between different watermarks, as well as boundaries between watermarked and un-watermarked signals. In some measurement and automatic recognition applications, it is required that the boundary between different programs be detected with a precision of under 1 second, and the processing time required to report the boundary may also be constrained to a few seconds (e.g., to synchronize functions and/or support interactivity within a time period shortly after the boundary occurs during playback). These types of boundaries arise at transitions among different audio-visual programs, such as advertisements and shows, for example, as well as within programs, such as the case for product placement, scene changes, or interactive game play synchronized to events within a program. Due to mixing of watermarked and un-watermarked content and watermark layering, each program may carry a different watermark, multiple watermarks, or none at all. It is not sufficient to merely report detection time of a watermark. Demands for precise measurement and interactivity (e.g., synchronizing an audio or video stream with other events) require more accurate localization of watermark boundaries. See, for example, U.S. Patent Application Publications 20100322469, 20140285338, and 20150168538, which are hereby incorporated by reference and which describe techniques for synchronization and localization of watermarks within host content.
In some usage scenarios, mixing of watermark layers occurs through orchestrated or un-orchestrated layering of watermark signals within content as it moves through distribution. In others, design constraints dictate that a watermark be replaced by another watermark. One strategy is to overwrite an existing watermark without regard to pre-existing watermarks. Another strategy is to decode pre-existing watermark and re-encode it with a new payload. Another strategy is to decode a pre-existing watermark, and seek to layer a subsequent watermark in the host content so as to minimize collision between the layers.
Another strategy is to reverse or partially reverse a pre-existing watermark. Reversal of a watermark is difficult in most practical use cases of robust watermarking because the watermarked audio-visual signal is typically altered through lossy compression and formatting operations that occur in distribution, which alters the watermark signal and its relationship with host audio-visual content. If it can be achieved reliably, partial reversal of a pre-existing watermark enables additional bandwidth for further watermark layers and enables the total distortion of the audio-visual content due to watermark insertion to be maintained within subjective quality constraints, as determined through the use of a perceptual model. Even partial reversal is particularly challenging because it requires precise localization of a watermark as well as accurate prediction of its amplitude. Replacement also further creates a need for real time authorization of the replacement function, so that only authorized embedders can modify a pre-existing watermark layer.
As noted, an application of digital watermarking is to use the encoded payload to synchronize processes with the watermarked content. This application space encompasses end user applications, where entertainment experiences are synchronized with watermarked content, as well as business applications, such as monitoring and measurement of content exposure and use.
When connected with an automatic content recognition (ACR) computing service, the user's mobile device can enhance the user's experience of content by identifying the content and providing access to a variety of related services.
Digital watermarking identifies entertainment content, including radio programs, TV shows, movies and songs, by embedding digital payloads throughout the content. It enables recognition triggered services to be delivered on receivers such as set-top boxes, smart TVs. It also enables recognition triggered services to be delivered through ambient detection within an un-tethered mobile device, as the mobile device samples signals from its environment through its sensors.
Media synchronization of live broadcast is needed to provide a timely payoff in broadcast monitoring applications, in second screen applications as well as in interactive content applications. In this context, the payoff is an action that is executed to coincide with a particular time point within the entertainment content. This may be rendering of secondary content, synchronized to the rendering of the entertainment content, or other function to be executed responsive to a particular event or point in time relative to the timeline of the rendering of the entertainment content.
Further features will be described with reference to the following detailed description and accompanying drawings.