Multimedia streaming—the continuous delivery of synchronized media data like video, audio, text, and animation—is a critical link in the digital multimedia revolution. Today, streamed media is primarily about video and audio, but a richer, broader digital media era is emerging with a profound and growing impact on the Internet and digital broadcasting.
Synchronized media means multiple media objects that share a common timeline. Video and audio are examples of synchronized media—each is a separate data stream with its own data structure, but the two data streams are played back in synchronization with each other. Virtually any media type can have a timeline. For example, an image object can change like an animated .gif file, text can change and move, and animation and digital effects can happen over time. This concept of synchronizing multiple media types is gaining greater meaning and currency with the emergence of more sophisticated media composition frameworks implied by MPEG-4, Dynamic HTML, and other media playback environments.
The term “streaming” is used to indicate that the data representing the various media types is provided over a network to a client computer on a real-time, as-needed, basis, rather than being pre-delivered in its entirety before playback. Thus, the client computer renders streaming data as it is received from a network server, rather than waiting for an entire “file” to be delivered.
In comparison to text-based or paper-based presentations, multimedia presentations can be very advantageous. Synchronized audio/visual presentations, for example, are able to capture and convey many subtle factors that are not perceivable from paper-based documents. Even when the content is a spoken presentation, an audio/visual recording captures gestures, facial expressions, and various speech nuances that cannot be discerned from text or even from still photographs.
Although streaming multimedia content compares favorably with textual content in most regards, one disadvantage is that it requires significant time for viewing. It cannot be “skimmed” like textural content. Thus, a “summarized” or “skimmed” version of the multimedia content would be very helpful.
Various technologies are available for “summarizing” or “previewing” different types of median content. For example, technology is available for removing pauses from spoken audio content. Audio content can also be summarized with algorithms that detect “important” parts of the content as identified by pitch emphasis. Similarly, techniques are available for removing redundant or otherwise “unimportant” portions or frames of video content. Similar schemes can be used with other types of media streams, such as animation streams and script streams.
Although such previewing techniques are available, these techniques typically require a significant amount of processing power to be performed and a significant amount of time to be completed. Such constraints make it difficult to generate previews “on the fly” as the data is being streamed to its destination.
One solution is to pre-generate and store a “preview” version of the multimedia content, thereby reducing the impact of “on the fly” calculations. However, generating and storing such a preview version creates a storage problem. The multimedia content itself frequently requires a significant amount of storage space. By storing an additional preview version of the multimedia content, the storage space requirements are increased further, thereby generating significant constraints on the media storage device. This problem is exacerbated if multiple preview versions are generated and stored.
The invention described below addresses these disadvantages of previewing multimedia content, providing an improved way to generate and maintain such preview content.