1. Field of the Invention
The present invention relates to a system and method for seamless content insertion on transmitted network content using audio-video fingerprinting and watermarking.
2. Prior Art
Local content insertion on network content is accomplished to address targeted ad insertion or local content in cable Multiple System Operator (MSO) headends, set-top boxes, and Internet browsers and in Internet content delivery networks. One application of local content insertion is targeted advertising wherein advertisements are placed to appeal to consumers or potential consumers, who are profiled based on demographics, besides a number of other variables. Advances in digital transmission technology enable new and improved methods for targeted advertising where advertisers can further verify the effectiveness of the ads and refine strategies utilizing quick feedback loops. Traditional broadcasting models where the same advertisement or content is seen by all users are slowly being replaced by local content insertion, to increase the spectrum of local advertisements. Amongst the first few advances in technology, video on demand (VOD) and switched digital video (SDV) enabled advertisers to customize content for specific groups of viewers. Introduction of feedback loops from the viewer's end, back to the headend, further enabled point-wise monitoring of user-preferences. Several standards have also been introduced during this evolution including that from the Society of Cable Telecommunications Engineers (SCTE), which define various standards for cable telecomm transmission systems. Advertising content is also changing simultaneously to include data tracks, over and above audio and video tracks in MPEG format [1].
In present-day systems, in order to identify the point of insertion, the network content that is streamed currently provides an out-of-band identification marker that determines the point of insertion that guarantees seamless local content insertion. This marker could be a Dual Tone Multi Frequency (DTMF) cue tone, an SCTE-35 message or an Adobe Flash ad insertion trigger. This marker is sent over the network from the source of transmission in an out-of-band manner synchronized with the point of insertion on the audio-video content. In these current systems there is a need to identify each point of insertion at the transmission source to identify and send relevant markers for the identified point of insertion.
WO 2006097825 discloses a system and method for household-targeted advertising wherein a Set Top Box (essentially housed at a consumer's premises) has targeted ads delivered to it, wherein the system is specifically programmed for an IP stack. In this patent, customised advertisements are delivered to a user by pulling on demand from a media storage device using SCTE-35 cues.
US2006075449 discloses a distributed architecture for digital program insertion in video streams delivered over packet networks wherein a head-end unit inserts Internet Protocol (IP) splice points into a digital video transport stream, which is embedded with cue tone signals. By using a splicing device downstream, this invention later inserts a specific ad at the splice points inserted by the head-end to customise the content based on demographic information.
US2005015816 discloses a system and method of providing triggered event commands via digital program insertion splicing wherein a DVS380 compliant message which is extracted to determine the point at which local content is inserted.
In these works of prior art, a complex workflow is implicit when there are hundreds of advertising content planned for local replacements across different points of insertion. In addition, the replacement of local content is predicated by the need for the marker, in several incompatible formats across transmission systems, to be inserted by the transmission source, which entails the need for the transmission business to participate in this ecosystem of local content insertion. This requirement renders present day systems top-heavy and unrealistic in the long term besides demonstrating no potential for increasing the spectrum of local ads as the methods to insert them seamlessly do not exist.
There are systems available in the market that uses audio-video fingerprinting only to effect local content insertion like DVEO's Gen2 ad insertion system and patent applications like “Video Detection and Insertion”, by Konig; Richard (US), et al. The key problem with these systems is the ability to uniquely identify content assets to be replaced. Given that there can be multiple content assets with long duration of content remaining the same, to identify unique assets using fingerprints warrants capturing and matching the fingerprints for a duration longer than the anticipated duration of same audio-video content, in non-unique content assets. This leads to a limitation of the system to anticipate and design the workflow for the worst-case duration of same audio-video on non-unique content assets.
The present invention eliminates the need for any intervention on the transmission source for identifying the point of insertion. This is accomplished by using an in-band marker on the network content (also referred to as a network content asset) to be replaced, instead of out-of-band marker. This in-band marker is a combination of a watermark embedded into the content and fingerprints extracted from the content. The watermark that is embedded can be composed of audio or video or some combination of the two. At the point of local insertion, where a local content asset replaces the network content asset, the method and system of the present invention are able to uniquely identify the network content asset to be replaced and identify the specific point of insertion that guarantees seamless local content insertion, in real-time.