Digital media (also known as multimedia) can be any of movies, e.g., short clips, television shows, movie trailers, feature length movies, etc., imagery, e.g., photographs, images, or parameterizations of images such as histograms, etc., text, e.g. printed words or symbols, such as sheet music, in digital form, music, such as visual representations of sound such as sheet music, notes on musical staffs, etc., or spectrograms, and so on. Today, multimedia is a principle part of a majority of Websites and accordingly, media analytics, summarization and skimming are of increasing importance. Summarization is the technique of condensing and abstracting multimedia, while analytics is the general technique of gaining insight from examining media segments and information. Skimming is the act of navigating through the summarization as well as optionally through original source with the help of a computer interface (typically of visual nature but also can involve other human senses).
The search for interaction and visualization techniques for digital media has a long and diverse heritage. Various two-dimensional (2D) techniques are prevalent on the Web and mobile Web and desktop. The most well-known includes chart-like visualizations in which two dimensions of information are shown such as in a long horizontal scrollable pick-list of images. Different types of graphs with a third dimension (3D) have long been used to capture multidimensional information and are used ubiquitously. Info-graphics, such as those found in magazines and newspapers such as USA Today™ sometimes present information in print in highly stylized scenes to create the visual effect of information within scenes. These current techniques do not employ scenes and are not typically interactive and are seldom applied to complex multimedia.
There is currently a need for drastically different and improved visual interaction techniques to support comprehensive exploration and analysis of multimedia—particularly video. Virtually all Web sites involved with the storage and transmission of Internet video offer only the ability to search through a pre-selected shortlist of scenes via linear, limiting and coarse-grained techniques. As multimedia is complex, however, queries become more conceptual, such as users wondering, “Is this the episode where the microwave oven catches aflame at one point?”. Such concept queries are problematic for the current art.
FIG. 1 illustrates two such techniques. On the top, a Horizontal Picker of scenes—also known as a ‘gallery’—is shown; this is typically accompanied by the tip, “Choose a scene to begin playing from there.” This technique is effective if, for example, the system has fortuitously pre-selected scenes that the user has interest in, but largely it is not effective as a skimming or as a summarization technique since only the pre-selected scenes are offered as candidates. On the bottom of the figure, the hover-over playback technique is shown in which the keyframe begins a simply playback when the user hovers the mouse over the icon and stops when the mouse leaves the icon. While playing back, the entire video may be played, or just segments or a fast forward version and so on. This technique almost always limits users to view a particular subset of media segments, e.g., video frames, played back when the mouse hovers over the icon. In addition, the playback is almost always very coarse, non interactive, and limited to a predetermined, small set of keyframes preventing the user from “exploring” the media in a meaningful or deep way. Further, this class of solution is a linear solution and provides only the most basic assistance to users with higher conceptual searches in mind.
Mobile media skimming is similarly primitive as in the desktop case and the problems are exacerbated by smaller screen sizes of mobile devices. None of the many mobile offerings from companies such as Sling Media, Joost, Veoh, Flixster, AT&T® and Sprint® enable rich or effective within-video skimming. Most of the above provide keyword search, simple “TV-guide” like interfaces, and extremely limited “choose a scene”-type action indexing.
Other approaches to viewing media using only sphere representation do not allow dynamic adaptation and do not support the interactive exploration of the media units on the sphere with respect to range, focus, and time. For example, many approaches exist for adjusting the virtual camera in three-dimensional (“3D”) gaining worlds as well as defining textures that are mapped to 3D virtual objects to give them their “skin”. For example, one approach in video games wraps texture maps around 3D characters in order to create visually convincing characters. However, this approach is not interactive and the texture is not at all a conveyance of media semantic. In the video game use case, the textures are not typically loaded from a remote server but loaded as the shapes locally once from the same place. Places on the character are not interactive to the “touch” of users. These solutions are effective in their own right but do not address an interactive and information-centric approach.
There is a need to improve media analytics and summarization and do so in a way that does not force the end user to enter search terms or otherwise understand textual information (thus the user could be illiterate and still search effectively). The problem revolves around the use of digital technologies to provide insight into media in rapid skimming sessions that are not overly long to perform nor overly non-intuitive. And while 2D is effective, one can now display 3D representations (or visual “metaphors”) on virtually all medium including laptops, tablets, and smartphones, thanks to improved software and hardware graphics acceleration. An effective use of 3D metaphor can drastically improve the skimming experience so the problem becomes: how to create a 3D metaphor on the device screen (independent of the device type) that effectively and intuitively conveys a skimming session for the user who is looking to perform analytics and experience summarization on the media.