The present invention relates generally to video signal processing, and more particularly to techniques for processing video signals to identify and extract commercials or other types of video content having particular characteristics.
Many different systems have been developed for the detection and extraction of commercials from broadcast or recorded video signals. For example, U.S. Pat. No. 4,782,401 entitled xe2x80x9cEditing Method and Apparatus for Commercials During Video Recordingxe2x80x9d describes a hardware-oriented solution for editing out commercials in the analog domain, based on the presence of dark or blank frames used to delineate commercials.
A similar system is described in PCT Application No. WO 83/00971, entitled xe2x80x9cReciprocating Recording Method and Apparatus for Editing Commercial Messages from Television Signals.xe2x80x9d This system edits out commercials based on fade-in and fade-out at the beginning and end, respectively, of a commercial break.
Another approach, described in U.S. Pat. No. 4,750,052 entitled xe2x80x9cApparatus and Method for Deleting Selected Program Intervals from Recorded Television Broadcasts,xe2x80x9d utilizes a fade detector to edit commercials from a recorded broadcast program.
PCT Application No. WO 94/27404, entitled xe2x80x9cMethod and Apparatus for Classifying Patterns of Television Programs and Commercials,xe2x80x9d uses feature extraction and a neural network to classify video signals. The system detects changes in features such as power amplitude over the frequency spectrum, color and brightness, vertical interval time code, closed caption signal, and color carrier jitter signal.
A system described in PCT Application No. WO 95/06985, entitled xe2x80x9cProcess and Device for Detecting Undesirable Video Scenes,xe2x80x9d stores an image from a broadcast program that precedes a commercial break so that the end of the commercial break may be detected by means of the stored image. This approach makes use of the fact that broadcasters often repeat a small part of the program after the end of the commercial break.
European Patent Application No. EP 735754, entitled xe2x80x9cMethod and Apparatus for the Classification of Television Signals,xe2x80x9d uses a set of features and associated rules to determine if the current commercials satisfy the same criteria with some degree of xe2x80x9cfuzziness.xe2x80x9d The set of features includes, e.g., stereo versus mono, two-channel audio, sound level, image brightness and color, and logos, used to characterize commercials. An extensive set of rules is required to accommodate thresholds and parameter variations for these features.
U.S. Pat. No. 5,708,477, entitled xe2x80x9cVideo Signal Identifier for Controlling a VCR and Television Based on the Occurrence of Commercials,xe2x80x9d uses a video signal identifier to recognize previously-identified commercial material and to reject it either by muting the television sound and/or pausing the VCR when it is in record mode. A significant problem with this approach is that it fails to provide automatic detection, i.e., it requires the material to be identified in some way prior to its detection.
A system described in U.S. Pat. No. 5,668,917, entitled xe2x80x9cApparatus and Method for Detection of Unwanted Broadcast Information,xe2x80x9d uses the repetitiveness of commercials to identify commercial material. This system stores video frames in a compressed format and compares frames in original xe2x80x9crawxe2x80x9d format pixel by pixel. If the pixels match, within some threshold, then the frames are considered similar. A serious drawback of this approach is the excessive memory and computational resources that it requires. More particularly, storing video even in a compressed format takes an impractically large amount of memory space, e.g., approximately 200 GB per day for one channel of high definition television (HDTV) content. In addition, comparing raw video is very time consuming. Even assuming that compressing and decompressing video can be implemented at no additional computational cost, comparing frames will be a very slow process. A given incoming frame must be compared with the above-noted large amounts of stored video material, and the comparison completed before the next frame arrives.
As is apparent from the above, a need exists for improved techniques for identification and extraction of commercials and other types of video content, which avoid the problems associated with the above-described conventional systems.
The invention provides improved techniques for spotting, learning and extracting commercials or other particular types of video content in a video signal. In accordance with the invention, a video signal is processed to identify segments that are likely to be associated with a commercial or other particular type of video content. A signature is extracted from each of the segments so identified, and the extracted signatures are used, possibly in conjunction with additional temporal and contextual information, to determine which of the identified segments are in fact associated with the particular type of video content. The temporal information may include, e.g., an indication of the amount of time elapsed between a given signature and a matching signature from a prior segment of the video signal. The contextual information may include, e.g., program information, such as program name, channel, time slot and rating, as obtained from an electronic programming guide or other information source.
One or more of the extracted signatures may be, e.g., a visual frame signature based at least in part on a visual characteristic of a frame of the video segment, as determined using information based on DC and motion coefficients of the frame, or based on DC and AC coefficients of the frame. Other visual frame signature extraction techniques may be based at least in part on color histograms. As another example, a given extracted signature may be an audio signature based at least in part on a characteristic of an audio signal associated with at least a portion of the video segment. Other signatures in accordance with the invention include, e.g., closed caption text describing an advertised product or service, a frame number plus information from a subimage of identified text associated with the frame, such as an 800 number, a company name, a product or service name, a uniform resource locator (URL), etc., or a frame number and a position and size of a face or other object in the image, as identified by an appropriate bounding box, as well as various combinations of these and other signature types.
In accordance with another aspect of the invention, a video processing system maintains different sets of lists of signatures, the sets of lists including one or more of a set of probable lists, a set of candidate lists and a set of found lists, with each entry in a given one of the lists corresponding to a signature associated with a particular video segment. The sets of lists are updated as the various extracted signatures are processed. For example, a given one of the signatures identified as likely to be associated with the particular video content is initially placed on one of the probable lists if it does not match any signature already on one of the probable lists. If the given signature matches a signature already on one of the probable lists, the given signature is placed on one of the candidate lists. A given one of the signatures on a candidate list is moved to a found list if it matches a signature already on one of the candidate lists. A given signature may also be removed from one or more of the lists in the event that the signature is not repeated within a designated time period.
In accordance with a further aspect of the invention, the system may be configured to involve a user in the commercial spotting, learning and extraction process. For example, a user remote control for use with a television, set-top box or other video processing system may be configured to include a xe2x80x9cnever againxe2x80x9d button, such that when the user presses that button, the commercial signature is automatically extracted and stored directly to a particular found list, without first passing through the above-noted probable and candidate lists.
In accordance with yet another aspect of the invention, particular user actions can be detected and used to trigger the automatic extraction of a signature from a given segment of a video signal. For example, the system can be configured to automatically extract a signature from a portion of a video signal that a user fast-forwards through when watching a playback of a previously-recorded broadcast.
Advantageously, the invention allows commercials and other types of video content to be identified, learned by the system and extracted, with a significantly reduced complexity relative to the above-noted conventional systems. More particularly, through the use of extracted signatures, the invention reduces the amount of memory and computational resources required to implement video content identification and extraction. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.