1. Field of the Invention
This invention is related to the summarization of video or motion images. The invention is more particularly related to determining a measure of importance of each of shots or segments from a video. The invention is also related to creating or printing a video or motion picture summary packed in a configuration for either emphasizing or de-emphasizing the importance of each segment or shot in the summary. The invention is further related to a method for packing different sized frames into a video summary such that a least cost packing (least weighted amount of resizing) is required for displaying the summary. The invention is still further related to utilizing the least cost packed video summary as an interface to a video browsing system.
2. Discussion of the Background
With the increasing use of video and storage of events and communication via video (Internet communications, increased television bandwidths and channels, increased use of video in Newscasts, etc.), video users and managers are confronted with additional tasks of storing, accessing, determining important scenes or frames, and summarizing videos in the most efficient manner.
A xe2x80x9cshotxe2x80x9d is a segment of video or motion image that is typically contiguous in time and visual space. Techniques exist to automatically segment video into its component shots, typically by finding the large frame differences that correspond to cuts, or shot boundaries. In many applications it is desirable to automatically create a summary or xe2x80x9cskimxe2x80x9d of an existing video, motion picture, or broadcast. This can be done by selectively discarding or de-emphasizing redundant information in the video. For example, repeated shots need not be included if they are similar to shots already shown (i.e., discarding less important information, such as repeated or common scenes).
Shahraray et al., at ATT Research have worked on using key frames for an HTML presentation of video. They picked one key frame from each shot without specific concern about reducing the number of key frames to decrease redundancy. The key frames are uniformly sized and simply laid out in a column along with text which was extracted from closed-caption data. No packing efforts were incorporated into their work.
Taniguchi et al., have summarized video using a 2-D packing of xe2x80x9cpanoramasxe2x80x9d which are large images formed by compositing video pans. In their method, key frames were extracted from every shot and used for a 2-D representation of the video content. Frames were not selected to reduce redundancy. Their packing procedure was somewhat sub-optimal, leaving white space in the resulting composites.
Yeung et al., have made pictorial summary of video using a xe2x80x9cdominance scorexe2x80x9d for each shot in the video, however details on how to implement such a score and how to utilize it are ad hoc. Also, the pictorial summaries use a special and predetermined structure that can be used only for a poster-like representation with which the time order of frames is often discarded.
Some other tools built for browsing the content of a video are known, but only provide inefficient summarization or merely display a video in sequence xe2x80x9cas it isxe2x80x9d.
The present inventors have realized that in order to increase the efficiency with which video summaries are generated, that a quantitative measure of shot or segment importance is needed. Such a quantitative measure could be utilized to determine which shots or segments of a video are most meaningful. The present inventors have also realized that such a quantitative measure would be best if determined objectively via calculation or formulation, thereby allowing the shot or segment selection process to be automated.
In addition, the present inventors have also realized that the quantitative measure may also be utilized to generate a video summary having only the most important shots or segments, and may be utilized in determining which shots or segments of a summary to emphasize (more important information) or de-emphasize (less-important information) by either increasing or reducing the sizes of representative frames (also referred to as keyframes).
Furthermore, the present inventors have determined a packing method for efficient 2-D presentation of each of the emphasized and de-emphasized shots or segments selected for summary. Thus, once shot importance is determined, a 2-dimensional still representation can be constructed by efficiently packing representative keyframes for each shot or segment sized relative to importance.
Accordingly, it is an object of the present invention to provide a method of determining importance of shots or segments in a video, including the steps of segmenting the video into shots or related frames; and calculating an amount of importance for each shot or segment. The step of segmenting also includes the step of clustering the frames of the video based on at least one of a common attribute and matching criteria or algorithm.
It is another object of the present invention to provide a method of summarizing a video, including the steps of determining an importance of component shots of the video; selecting component shots or segments to be used in a summary based on their importance and extracting representative frames from the selected component shots; and presenting the representative frames in a video summary. The step of presenting includes the step of sizing each representative frame based on the importance of the shot from which the frame is extracted and an amount of space in a pre-determined bounded area for display of the summary, and packing the representative frames into the pre-determined bounded area.
It is also an object of the present invention to provide a method for packing a sequence of frames into a bounded area, including the steps of fitting frame sequences to the bonded area, and selecting a frame sequence having a leased cost for the bounded area.
It is also an object of the present invention to provide an interface for viewing the video summary. The interface may be, but is not limited to, paper having the video summary printed along with reference codes linked to a starting point or portion of the video for each frame of the summary, or a web based interface with links corresponding to one or more starting points or portions of the video.