This invention is related to searching for information, specifically to a method for searching and creating a personalized multimedia summarization based upon user-input criteria. In particular, the multimedia summarization is based upon a user-specified theme, and multimedia imagery is searched, and imagery is selected for the summarization according to a statistical framework. The invention is embodied in a method, a computer system, and a computer program product that creates a personalized multimedia information summary.
The following papers provide useful background information on the indicated topics, all of which relate to the invention, and are incorporated herein by reference.
Heuristics used in video frame selection:
M. A. Smith and T. Kanade, Video Skimming And Characterization Through The Combination Of Image And Language Understanding Techniques, Proceedings of CVPR ""97 (1997), pp. 775-81.
Comprehensive user studies:
H. Wactlar et al., Lessons Learned From Building A Terabyte Digital Video Library, IEEE Computer, vol. 32, February 1999.
Video summaries based exclusively on visual features:
D. DeMenthon et al., Video Summarization By Curve Simplification, Tech. Rep. LAMP-TR-018, Language and Media Processing laboratory, University of Maryland (1998).
Topic clustering:
Y. Yang, An Evaluation of Statistical Approaches To Text Categorization, Information Retrieval Journal (1999).
There will now be provided a discussion of various topics to provide a proper foundation for understanding the invention.
The rapid progress in computer, data storage and telecommunication has brought about a multimedia information era where image, audio and video data are becoming the information highways of our society. The advent of Internet and World-Wide Web has dramatically changed the manner in which people acquire and disseminate information and knowledge, as computer and telecommunication giants are teaming up with content providers to exploit the huge business potentials on Internet. Television broadcasters, newspaper publishers, entertainment companies, consumer product retailers and service retailers are expanding their presence on the Internet. Personal computers are quickly becoming an information housekeeper for the consumer, and are responsible for accessing and storing information from various sources such as online newspapers and broadcasters.
An inevitable consequence of this evolution is the rise of an overwhelming information glut. Nowadays, finding pieces of relevant information from Internet is fast becoming a difficult task. To facilitate rapid search and retrieval of relevant information from a huge number of information sources, the following two techniques will become indispensable:
Effective content-based indexing and search schemes for multimedia data including text, audio, image and video.
Automatic content summarizations for multimedia data collections.
Automatic content summarizations are equally as important as effective content-based indexing and search schemes for rapid search and retrieval of desired information. By way of example, assume that an attorney wants to study a particular lawsuit that has received much attention from the media. Further, assume that there are hundreds of digital video libraries available on the Internet for searching. If each digital video library provides a content summary listing the top one hundred topics included in the library, or a content overview that digests the whole news collection on the lawsuit in question, the officer will be able to easily figure out which subset of the libraries deserves a thorough search. In another example, assume that a personal information housekeeper records news broadcasts from all the major television broadcasters for a corporate CEO on a daily basis. The corporate CEO reviews the information gathered by the personal information housekeeper on a daily basis. If the information housekeeper can automatically create an updated summary of the gathered news broadcasts, the corporate CEO will be able to quickly identify news reports that are of interest and require further attention.
The above examples are but two representative cases that illustrate a search for particular information. In the first search type, the person has a particular topic to research, but does not know which data collection would provide the most useful information, and therefore should become the focus of further investigation. In the second search type, the person does not have a particular topic to research, but instead wishes to investigate whether a particular data collection contains any interesting subjects worthy of attention.
Various examples illustrate how content summaries are of great importance when people are searching for desired information. By way of example, assume a student enters a bookstore to buy an HTML reference book. As there are dozens of HTML reference books available on the shelf, the student has to rely on the book summaries, table of contents, or a quick browsing of the books in order to select one commensurate with the student""s level of understanding and personal preference. Further, assume that another student comes to the bookstore wanting to find an interesting novel to read in the coming summer vacation. The student may first peruse the category signs shown on the shelves, and then determine a particular category to browse through. Next, the student may read the summary on the book jacket, or take a quick glance at some chapters if necessary, in order to select a suitably interesting novel.
There are three levels of content summaries for a digital video library: (1) library-wide summary that lists the major topics collected by the library; (2) topic-wide summary that summarizes the whole data collection for a particular topic; and (3) item-wide summary that digests the original video program into a shorter version. To date, most research studies in the literature have been for the automatic creation of item-wide summaries.
The video skimming technique developed by Carengie Mellon University summarizes individual TV news programs into a user-specified length based on a set of primitive heuristics. These heuristics include the selection of frames prior to the introduction of a proper name, frames from short shots, frames with both human faces and superimposed captions, and frames before or after a camera pan/zoom. Although the video skimming technique employs closed captions, video optical character recognition and various visual features with the intention of obtaining content-based video summaries, comprehensive user studies received mixed responses from the examiners who participated in the study.
On the other hand, the video summarization method of the Language and Media Processing group at the University of Maryland creates video summaries with a controllable length exclusively based on visual features of the video programs. The method first represents a video sequence as a trajectory curve in a high dimensional feature space, and then uses the recursive binary curve-splitting algorithm to approximate the video curve with a sequence of perceptually significant points. Each of the points on the video curve corresponds to a frame in the original video sequence. By showing the thumbnail images of the frames represented by these significant points, or by playing these frames sequentially, the user receives a summarized view of the original video sequence. As the curve splitting algorithm assigns more points to the curve segments with larger gradients, this summarization method naturally uses more frames to represent shots with more variations.
In spite of methodological differences, a separate method from Panasonic (Patent No. JP-8-251540) produces video summaries with a very similar characteristic to the Language and Media Processing group""s method. More specifically, Panasonic""s method is based on a set of video summarization rules that roughly sub-sample long shots with little changes and finely sub-sample short shots with big changes.
There is also literature on studies that exploit audio characteristics of the video sequence for summarization purposes. Nippon Telegraph and Telecommunication Corp.""s method (Patent No. JP-3-80728) detects semantically important audio patterns (e.g. applause, cheers) based on a predefined audio dictionary, and composes video summaries by selecting the video segments that contain these audio patterns.
A different method developed by MLS (Patent No. JP-8-334479) strives to score the importance of each shots based on its audio characteristics such as the inclusion of music, speech, special sound effects, etc, and then creates video summaries by picking up the shots with the highest scores.
Other video summarization methods have been reported in the literature. All are very similar to the methods described above in terms of methodologies, and underlying heuristics.
However in general, most of these existing methods have two major problems in common. First, these methods are usually based on a fixed set of heuristics that are derived either from common video production patterns or from common senses. As each particular video production pattern and common sense may or may not be a good indicator of important scenes depending on the video domain under the consideration, creating video summaries using these heuristics may satisfy only very limited types of applications.
Second, these methods do not take the requirements of the user into consideration in composing video summaries. Whether a video summary is good or not depends heavily on a user""s initial requirements. For example, if a user is looking for news reports about a particular celebrity, the user may desire a human-centric video summary to include as many stored video scenes regarding that particular celebrity as possible. On the other hand, if a user has an interest in examining the damage that results from a natural disaster, the user may desire a nature-centric video summary to include nature scenes. Video summaries adaptive to personal desires are especially important when used as a means of information search and retrieval.
Although content summaries greatly facilitate information retrieval tasks, automatic multimedia content summarization has yet to receive as much attention from the multimedia information retrieval community as the research for content-based indexing and retrieval.
The invention has been made in view of the above circumstances and has an object to overcome the above problems and limitations of the prior art, and has a further object to provide the capability to create a multimedia summarization based upon a user-specified information need from a plurality of multimedia databases.
Additional objects and advantages of the invention will be set forth in part in the description that follows and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
It is a further object of the invention to provide a method, a computer system and a computer program product for a multimedia summarization, wherein keywords or topic clustering is used to retrieve multimedia imagery from multimedia databases.
It is yet a further object of the invention to provide a method, a computer system and a computer program product for a multimedia summarization, wherein the user selects a theme in order to select relevant multimedia scene shots from the retrieved multimedia imagery.
It is still a further object of the invention to provide a method, a computer system and a computer program product for a multimedia summarization, wherein multimedia features are extracted from multimedia scene shots by using natural language processing and video analysis techniques to determine the relevance thresholds of the multimedia scene shots.
The above and other objects of the invention are accomplished by providing a method, a computer system and a computer program product for a multimedia summarization, wherein the relevance thresholds for determining whether a multimedia scene shot is included in the final multimedia summarization are set.
The above and further objects of the invention are further accomplished by providing a method, a computer system and a computer program product for a multimedia summarization, wherein the setting of relevance thresholds further includes the use of heuristic rules to assign relevance values to extracted multimedia features.
The above objects are further achieved by providing a method, a computer system and a computer program product for a multimedia summarization, wherein selected multimedia scene shots with high relevance scores to the user-specified theme are assembled into a multimedia summarization.
According to the invention, the invention extracts a predetermined number of multimedia features from each multimedia scene shot and measures the relevance between each feature and the user-specified theme.
According to the invention, the invention performs video analysis on the multimedia scene shots to detect and match human features, extract prominent colors, extract spatial features and to detect motion.
In further accordance with the above objects, the invention forms a subset of the selected multimedia scene shots based upon the relevance thresholds.
According to the invention, the invention removes duplicate multimedia scene shots from the multimedia scene shots that are most relevant prior to assembling the multimedia summarization.