Broadcasting companies or broadcasters transmit news, shows, sports events and films as programs to viewers who receive the programs through terrestrial, satellite and/or cable broadcast signals.
Advertisements accompanying such programs are very important for the business model of broadcasters. It is common practice that broadcasters include advertisements in dedicated advertisement breaks during a program. With the emergence of TV receivers offering time shift recording and viewing functionality, many viewers tend to skip the advertisement breaks by jumping forward in the recorded program or by switching into the fast forward mode. The reason for doing so is that, firstly, most of the times the advertisements are not relevant for the majority of the viewers and, secondly, it is very easy to avoid the advertisement breaks utilizing the time-shift functionality. Under such circumstances the main goal of the client of the broadcaster, who is paying for the advertisement placement, is missed because the advertisement does not reach out anymore to potential customers of the company who has placed the advertisement.
The obvious weakness of placing advertisements in advertisement breaks can be alleviated by embedding the advertisement in the program itself. The simplest approach for embedding the advertisement is to create a composed image by inserting the advertisement as a text box or banner into a number of video frames of the broadcasted program. This concept is known from prior art and will be explained in greater detail with reference to FIGS. 1A and 1B.
A more elegant approach is to insert the advertisement as an integral part of the video sequence e.g. displaying the advertisement on a billboard shown in a video sequence. However, in order to create a good impression and maintain a natural look of the composed image, the advertisement needs to be adapted to the rest of the scene in the video sequence. Typically, this approach requires human intervention to obtain results of good quality.
Embedding advertisement into a composed image makes it for the viewer practically impossible to avoid the advertisement. But embedding the advertisement alone still fails to make the advertisement more relevant for the viewer. In order to address this issue, the displayed advertisement needs to take into account individual interests of the viewer or, in other words, the advertisements need to be targeted to the viewer.
The approach of providing targeted content is known from video games for example. The selection of the advertisements is made by means of individual information stored in a game console of a videogame. WO 2007/041 371 A1 describes how user interactions in a video game are used to target advertisements. E.g. if the user selects a racing car of a specific brand, then an advertisement of the same brand is displayed in the video game.
The insertion of targeted content in video games is comparatively simple because the creator of the video game has full control of the scenery and can, therefore, provide scenes that are suitable for advertisement insertion. In addition, in a video game the video processing is completely controlled inside the video console. In a broadcast environment the insertion of targeted content is more complex.
In the co-pending European patent application EP 13 305 151.6 of the same applicant, it is suggested to identify in a video sequence a set of frames appropriate for inserting advertisements as targeted content. According to that method two sets of meta-data are created. The first set of metadata relates to the video content, e.g. frame numbers of those frames susceptible for inlaying the advertisement, coordinates where the advertisement should be placed, a geometrical shape of the advertisement, the used color map, light setting, etc. A second group of meta-data provides information that is required for selecting the appropriate content in the video sequence. The second set of meta-data comprises therefore information about the inserted content itself, the context of the scene, the distance of a virtual camera, etc. The method of inserting targeted content described in EP 13 305 151.6 works well as long as all meta-data are completely available.
However, in a video broadcast system, the video signal is transformed along its distribution chain from the broadcaster to the premises of the viewer. It may be transcoded, re-encoded, converted from digital to analog signals and vice versa, audio tracks may be edited or removed or changed. These transformations are generally not under the control of a single entity. Therefore, time markers or any other meta-data may get lost during these transformations. Potential remedies for this problem are video and/or audio watermarks. Video and audio watermarks are not susceptible to the mentioned transformations and could therefore serve as invariable markers in the video and/or audio sequence. However, content owners do not always accept to include watermarks because they are concerned by a potential negative effect on the quality perception of the viewer. Some broadcasters refuse to include watermarks because they do not want to modify the content broadcast workflow.
Also for the following reasons watermarking is not a preferred technology for the sole purpose of synchronization of two video streams or identifying matching corresponding frames in two video streams. Watermarking is based on a symmetric key for embedding and decoding the watermarks. The key and the process of watermarking must be based on secure hardware which is too costly for many consumer electronics applications. In addition to that, scaling watermarking for a large number of devices is also an issue.
For these reasons, video and/or audio watermarks are no feasible solution to compensate for the loss of time markers and meta-data.
Video fingerprinting is another technique that may provide frame accurate synchronization of a broadcasted or multi-casted video stream with the corresponding original video stream. However, matching a video fingerprint (signature) extracted by the video player against all signatures of the video provided by a server is costly and cannot be carried out in real-time by a set top box (STB).
Therefore, there remains a need for a solution to match one or several corresponding frame(s) in a broadcasted multimedia stream with the corresponding original multimedia stream with frame accuracy.