This invention relates to multimedia content summarization. More particularly, the invention relates to generating and providing summaries of multimedia content based on user feedback.
Multimedia streamingxe2x80x94the continuous delivery of synchronized media data like video, audio, text, and animationxe2x80x94is a critical link in the digital multimedia revolution. Today, streamed media is primarily about video and audio, but a richer, broader digital media era is emerging with a profound and growing impact on the Internet and digital broadcasting.
Synchronized media means multiple media objects that share a common timeline. Video and audio are examples of synchronized mediaxe2x80x94each is a separate data stream with its own data structure, but the two data streams are played back in synchronization with each other. Virtually any media type can have a timeline. For example, an image object can change like an animated .gif file, text can change and move, and animation and digital effects can happen over time. This concept of synchronizing multiple media types is gaining greater meaning and currency with the emergence of more sophisticated media composition frameworks implied by MPEG-4, Dynamic HTML, and other media playback environments.
The term xe2x80x9cstreamingxe2x80x9d is used to indicate that the data representing the various media types is provided over a network to a client computer on a real-time time, as-needed basis, rather than being pre-delivered in its entirety before playback. Thus, the client computer renders streaming data as it is received from a network server, rather than waiting for an entire xe2x80x9cfilexe2x80x9d to be delivered.
In comparison to text-based or paper-based presentations, multimedia presentations can be very advantageous. Synchronized audio/visual presentations, for example, are able to capture and convey many subtle factors that are not perceivable from paper-based documents. Even when the content is a spoken presentation, an audio/visual recording captures gestures, facial expressions, and various speech nuances that cannot be discerned from text or even from still photographs.
Although streaming multimedia content compares favorably with textual content in most regards, one disadvantage is that it requires significant time for viewing. It cannot be xe2x80x9cskimmedxe2x80x9d like textual content. Thus, a xe2x80x9csummarizedxe2x80x9d version of the multimedia content would be very helpful.
Various technologies are available for summarizing or xe2x80x9cpreviewingxe2x80x9d different types of media content. For example, technology is available for removing pauses from spoken audio content. Audio content can also be summarized with algorithms that detect xe2x80x9cimportantxe2x80x9d parts of the content as identified by pitch emphasis. Similarly, techniques are available for removing redundant or otherwise xe2x80x9cunimportantxe2x80x9d portions or frames of video content. Similar schemes can be used with other types of media streams, such as animation streams and script streams.
Although such techniques are available for previewing media content, these techniques lack a semantic understanding of the multimedia content. These techniques rely on assumptions regarding the multimedia content based on the way in which the presentation is made (e.g., the manner in which words and sentences are spoken, the manner in which video frames are sequenced, etc.), rather than an understanding of the importance of the different portions of the multimedia content.
Furthermore, current techniques lack the ability to distinguish between different groups of users. Different groups of people (e.g., the sales department and the legal department of a corporation) may feel that different portions of the multimedia content are interesting. Current techniques do not provide any ability to distinguish between such different user interests.
The invention described below addresses these disadvantages of summarizing multimedia content, providing an improved way to summarize such content.
A system includes a multimedia server computer or other device that can provide multimedia content, as well as summaries of the multimedia content, to one or more client computers. Summaries are generated to include those portions of the multimedia content that are most interesting to previous users, as identified by feedback from the previous users. Thus, the summary presented to a user includes the portions identified as interesting by previous users.
According to one aspect of the invention, each of the users of a client computer is identified as being part of a group and different summaries are generated for each group. Each summary includes those portions of the multimedia content that are most interesting to previous users of the corresponding group. Thus, the summary presented to a user includes only the portions identified as interesting by previous similar users (that is, users in the same group).
According to another aspect of the invention, the summaries are continually updated as each user is presented with the multimedia content and/or a summary of the content. Feedback from each user is collected and used to further refine the summary. For example, a portion of the multimedia content may be dropped from the summary if feedback from subsequent users indicate the portion is not interesting.
According to another aspect of the invention, the multimedia content is separated into multiple different segments or portions. The segments may be pre-determined or alternatively may be dynamically defined. Each of these segments is given a different xe2x80x9cscorexe2x80x9d for each group. These scores are then modified as user feedback is received. User feedback indicating a segment is interesting increases the score of that segment, while user feedback indicating a segment is not interesting decreases the score of that segment. The highest scoring segments are then provided as the summary of the multimedia content.
According to another aspect of the invention, the user feedback includes both explicit and implicit feedback. A user""s inputs during presentation of the multimedia content and/or the summary of the multimedia content are monitored. Explicit feedback may be provided by the user, such as selection of a xe2x80x9cthis is interestingxe2x80x9d button or a xe2x80x9cthis is not interestingxe2x80x9d button. Additionally, implicit feedback may be provided by the user, such as selection of a fast forward button (implying the portion is not interesting) or selection of a rewind or replay button (implying the portion is interesting).