TV advertising commercials exist in virtually all video data streams, subsidizing some or all of the cost of providing the content to the viewer. The ability to identify where the commercials exist in the video data stream has become an important goal for two main reasons. First, advertisers who pay to place the commercials wish to verify that the commercials were actually played, either by being “aired” during a broadcast, or “streamed” during an internet-based viewing session. The auditing process can be greatly enhanced if commercials can be identified as they are being played so that there can be a recordation to document the airing or streaming. Second, technology built into a video playing device, or executing concurrently with a video playing device, can “skip” commercials, assuming that the location of the commercials can be accurately identified so that no programming is skipped. Some conventional technology for identifying where commercials exist in a video data stream is described in an article entitled “Automatic Detection of TV Commercials” (Satterwhite, B.; Marques, O.; Potentials, IEEE, Volume 23, Issue 2, April-May 2004 pp. 9-12). Satterwhite et al. describes two main categories of methods for detecting commercials, namely, “feature-based detection” and “recognition-based detection.” Feature-based detection uses general characteristics of commercials to detect their possible presence. Recognition-based detection works by trying to match commercials with ones that were already learned. Some general characteristics (heuristics) of commercials and commercial breaks include the following:
i. Multiple frames of black are displayed at the beginning and end of each commercial block and between each commercial in the block. There is no audio during these frames.
ii. If a network displays a logo in the corner of the screen, the logo will not appear during the commercials.
iii. Duration is typically some increment of 15 seconds, up to 90 seconds.
iv. Commercials are high in “action,” measured by a relatively larger number of cuts per minute between frames compared to a TV show.
v. Commercial breaks tend to occur at the same time in each episode of a given TV series.
Recently, a third reason has arisen to identify where the commercials exist in a video data stream. Mobile devices (e.g., tablets, smartphones) are now in heavy use while viewers watch television (TV). This provides a new platform for synchronized advertising delivery, wherein the TV advertiser may extend their reach to the mobile device. For example, when a particular commercial is airing on, or streaming to, the TV, another commercial may be delivered to the mobile device in either near real-time or in a coordinated delayed time. The mobile ad may be for the same or different product or service as shown in the commercial that was aired on, or streamed to, the TV. To implement such a system, the commercial that was aired on, or streamed to, the TV must be able to be instantly identified.
To facilitate such a system, a database of commercials is maintained so that near real-time matching and identification occurs as a commercial is aired or streamed to a TV, computer, or mobile device. To build such a database, video data streams are analyzed by automated content recognition (ACR) systems. Such systems are well-known in the art. One type of ACR system uses audio fingerprints within video signals to perform the content recognition. One commercially available audio ACR system is made by Audible Magic Corporation, Los Gatos, Calif. Another commercially available audio ACR system is Gracenote Entourage™ commercially available from Gracenote, Inc., Emeryville, Calif. Other ACR systems are disclosed in U.S. Patent Nos. 2011/0289114 (Yu et al.), 2013/0071090 (Berkowitz et al.), and 2013/0205318 (Sinha et al.), each of which are incorporated by reference herein. Accordingly, the details of the search engine 214 and database 216 of FIG. 2 with respect to the recognition processing are not further described.
As is well-known in the art, search engines associated with ACR systems perform the comparisons on representations of content, such as fingerprints of the content. Thus, in one preferred embodiment, the database 216 maintains content fingerprints of known commercials for comparison with fingerprints of content in the incoming video data stream.
One known technique for performing ACR is to match audio samples from a video data stream against audio files in a database. If the audio sample matches a portion of one of the audio files in the database, the time window of the audio sample in the audio file can be used to calculate the start and end time of the commercial. See, for example, FIG. 1, wherein the ACR found that a four second sample from the video data stream matched a four second audio portion of a one minute (60 second) commercial C1. More specifically, the four second sample S1 matched audio starting at 20 seconds into commercial C1 and ending at 24 seconds into the commercial C1. If the time at the beginning of the sample S1 is t1, then the start time of the commercial (T1) is (t1−20 seconds) and the end time of the commercial (T2) is (t1+4 seconds+36 seconds=t1+40 seconds). Thus, for example, if t1 occurred at 11:00:20 am, the commercial can be presumed to have aired between 11:00:00 (start time) and 11:01:00 am (end time). Since the sampling process occurs continuously, another matching four second sample S2 may start at 40 seconds into the commercial C1 (t3) and end at 44 seconds into the commercial (t4). In fact, there may be many matching samples for a one minute commercial. The air time of the commercial C1 would be exactly the same for S2 as it is for S1. That is, the offset times for the samples that determine when the commercial starts and ends will vary but the actual start and end times will be identical for the two samples. The number of times that samples match a specific commercial and also match the same start and end times can be used to verify the matching process. For example, benchmarks may establish that a 60 second commercial should have a predetermined number of matching samples to be counted as an actual match.
Despite the high accuracy rates and extensive algorithmic techniques used by ACR systems, numerous scenarios may exist that will result in samples matching multiple commercials. Some of these scenarios are as follows:
1. Identical commercials having slightly different time lengths are stored in the database. For example, a version of C1 (e.g., C1′) having identical audio and video content may be used that is 58 seconds long vs. 60 seconds for C1. That is, the commercial is exactly the same, but is sped up to fit within a slightly shorter time window. The commercial itself would be visually and audibly indistinguishable to the viewer. Since the commercial has a different overall length, the time offset calculations will result in different start and/or end times. The sampling process will thus identify two candidate commercials (C1 and C1′) as matches.
2. Similar commercials having significantly different time lengths are stored in the database. It is common in the industry to produce multiple length commercials for the same ad campaign, such as a 15 second, 30 second and 60 second ad. Portions of each ad may have identical audio and/or video content, such as slogans or tag lines and brand names. Four second samples may thus match multiple commercials in the database. Since the commercials have completely different overall lengths, there will be different start and/or end times for each matching commercial. While the number of times that samples match a specific commercial could potentially be used to determine which ad length is the correctly aired one, there are many instances where this technique will not satisfactorily identify the correctly aired commercial.
3. Different commercials having similar audio and/or visual content are stored in the database. Two completely unrelated brands may use similar audio phrases and/or video content. For example, two completely unrelated brands may license the same song to accompany the commercial, or may coincidentally or deliberately use similar audio content in portions of the commercial.
Another scenario may exist where the samples match the exact same commercial, but the start and end times cannot be accurately identified because the matching content appears in different portions of the commercial. Referring again to FIG. 1, consider an example where the database contains only one commercial that has an audio portion that states “2016 Toyota Camry,” but the commercial includes two different audio portions that state “2016 Toyota Camry,” one at S1 and another at S2. Upon hearing the first instance of “2016 Toyota Camry” in the aired commercial, the ACR system may not be sure which instance of “2016 Toyota Camry” in the stored commercial is the correct one for matching, and thus may match it to both instances, resulting in two different start and end times for the same exact commercial. While one of the instances is correct, the ACR system is not sure which one is correct, and thus may log them both.
Accordingly, there is a need to automatically detect which matched commercial is likely to be the correctly played one when multiple candidate commercials are identified by the ACR system. The present invention fulfills such a need.