The present invention is directed to audio classification. More particularly the present invention is directed to a method and apparatus for classifying and separating different types of multi-media events based upon an audio signal.
Multi-media presentations simultaneously convey both audible and visual information to their viewers. This simultaneous presentation of information in different media has proven to be an efficient, effective, and well received communication method. Multi-media presentations date back to the first xe2x80x9ctalking picturesxe2x80x9d of a century ago and have grown, developed, and improved not only into the movies of today but also into other common and prevalent communication methods including television and personal computers.
Multi-media presentations can vary in length from a few seconds or less to several hours or more. Their content can vary from a single uncut video recording of a tranquil lake scene to a well edited and fast paced television news broadcast containing a multitude of scenes, settings, and backdrops.
When a multi-media presentation is long, and only a small portion of the presentation is of interest to a viewer, the viewer can, unfortunately, spend an inordinate amount of time searching for and finding the portion of the presentation that is of interest to them. The indexing or segmentation of a multi-media presentation can, consequently, be a valuable tool for the efficient and economical retrieval of specific segments of a multi-media presentation.
In a news broadcast on commercial television, stories and features are interrupted by commercials interspersed throughout the program. A viewer interested in viewing only the news programs would, therefore, also be required to view the commercials located within the individual news segments. Viewing these interposed and unwanted commercials prolongs the entire process for the viewer by increasing the time required to search through the news program in order to find the desired news pieces. Conversely, some viewers may instead be interested in viewing and indexing the commercials rather than the news programs. These viewers would similarly be forced to wade through the lengthy news programs in order to find the commercials that they sought to review. Thus, in both of these examples, it would benefit the user if the commercials and the news segments could be easily separated, identified, and indexed, so that the segments of the news program that were of specific interest to a viewer could be easily identified and located.
Various attempts have been made to identify and index the commercials placed within a news program. In one known labor intensive process the news program is indexed through the manual observation and indexing of the entire programxe2x80x94an inefficient and expensive endeavor. In another known process researchers have utilized the introduction or re-introduction of an anchor person in the news program to provide a queue for each segment of the broadcast. In other words, every time the anchor person was introduced a different news segment was thought to begin. This method has proven to be a complex and inaccurate process relying upon the individual intricacies of the various news stations and their various news anchor people; one that can not be implemented on a widespread basis but is, rather, confined to a restrictive number of channels and anchor people due to the time required in establishing the system.
It is, therefore, desirable to provide a simpler process for identifying and indexing commercials in a television news program: one that does not rely on the individual queues of a particular news network or reporter; one that can be efficiently and accurately implemented over a wide range of news programs and commercials; one that overcomes the shortcomings of the processes used today.
The present invention includes a method and apparatus for segmenting a multi-media program based upon audio events. In one embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
In an alternative embodiment of the present invention a computer-readable medium is provided. This medium has stored thereon instructions that are adapted to be executed by a processor and, when executed, define a series of steps to identify commercial segments of a television news program. These steps include selecting samples of an audio stream at a preselected interval and then grouping these samples into clips which are then analyzed to determine if a commercial is present within the clip. This analysis includes determining: the non silence ratio of the clip; the standard deviation of the zero crossing rate of the clip; the volume standard deviation of the clip; the volume dynamic range of the clip; the volume undulation of the clip; the 4 Hz modulation energy of the clip; the smooth pitch ratio of the clip; the non-pitch ratio of the clip; and, the energy ratio in the sub-band of the clip.