An increasing number of people own and use camcorders to make videos that capture their experiences and document events in their lives. One of the primary problems with consumer home video acquisition devices such as camcorders is that they are linear tape based devices and a single tape may contain multiple “events” (e.g. birthday party, soccer game, vacation video, etc.). Each event may in turn consist of multiple “shots” (i.e. the sequence of contiguous video frames between the time when the camera is instructed to start recording and when it instructed to stop recording). Unfortunately, the linear nature of the videotape often makes it difficult to find and play back a segment of the video showing a specific event.
Multimedia editing applications (MEA) allow users to bring versatility to such linear video recordings via their personal computer by allowing the user to capture or transfer the video from their videotape onto their computer and then to manually segment the digital video file into events of their choosing. Some MEAs make this easier for the user by automatically segmenting the video file into shots displayed in a library and then allowing the user to manually select shots and combine them to form events of their choosing. MEAs that automatically segment video into shots typically do so by analyzing the time of day recorded with each frame on a Digital Video (DV) camera to find discontinuities, or by analyzing the image content of each frame recorded by an analog camera using color histograms or other image analysis techniques, or by simply segmenting the video at fixed time intervals (e.g. every 15 seconds).
Unfortunately, existing MEAs provide little or no assistance if the user wants to browse the digital video file at an intermediate segmentation somewhere below the entire tape or video library, or somewhere above the individual shots. At best, existing MEAs might group items within a tape or video library by day, month, or year, but such grouping is relatively unintelligent. For example, a video file containing a video recording of a New Year's party spanning the period Dec. 31, 2002 to Jan. 1, 2003 would likely be split over two days, over two months and over two years depending on the view selected, whereas such a video recording really documents one event to the user. Furthermore, existing MEA segmentation is often too granular (too many segments) or too coarse (too few segments). For instance, a tape viewed ‘by day’ might contain 27 separate days, whereas this may correspond to three vacations to the user, each lasting between one and two weeks. Another tape might have been recorded entirely on a single day and thus show only one segment when viewed ‘by day’ but to the user it is really two separate events, one in the morning and one in the afternoon. To obtain a meaningful segmentation of the digital video file, the user must undergo a lengthy and tedious process to either split the entire digital video file into segments manually or combine shots to form desired segments. Furthermore, such user-initiated segmentation creates a static segmentation of the digital video file and it is hard to create a more granular (more segments) or less granular (less segments) view of the digital video file without starting over.
When an MEA is used to create an optical video disc (e.g. DVD-video disc, VCD video disc, etc.) the user will typically create chapters within the digital video file allowing other people viewing the disc to navigate quickly to a relevant section. Creating chapters is similar to the segmentation process described above but further complicated by the fact that certain segmentations are not desirable segmentations given the limitations of DVD-video menu navigation. For instance, typically each page in a DVD-video menu has up to six (6) items on it (any more and the icons become too small to see easily across the room), a segmentation with seven (7) segments is a undesirable segmentation because it leaves a ‘hanging chapter’ orphaned on a separate menu page. A better segmentation might be one with six (6) segments or twelve (12) segments because they result in full menu pages. There is, however, no simple rule that can be applied because a segmentation of 7 segments may be the only logical segmentation of the video (e.g. a tape consisting of 10 minute shots recorded one each day for a week, starting at 8 AM each morning can only sensibly be segmented into 7 segments).
Similar considerations apply when browsing the digital video file (or a library consisting of multiple digital video files) on a computer screen either within an MEA or using a Multimedia Browser (MB) such as Internet Explorer® offered by Microsoft, Inc. For any given display resolution and window size there is an optimal segmentation of the digital video file that creates an optimal number of segments based on the number of segments that can be displayed and the optimal selection of segments from the digital video file. Again, there is no easy rule for deciding this and in some cases it may be appropriate to list all segments even if the user then needs to scroll the display to see some of them.
In some cases, even if a perfect segmentation can be found, users may still want to force more segments or fewer segments to be used for personal preference reasons. When a static segmentation of the digital video file has been created by the user it is impossible to easily move to a more granular (more segments) or less granular (less segments) clustering of the video.
Still images taken using Digital Still Cameras (DSC) also have dates and times associated with them and can be clustered on a computer display or on an optical disc in a similar fashion. Mixed collections of still images and video shots can also be clustered. For example, a user might use a Multimedia Browser to view a ‘birthday’ event consisting of video shots and still images taken at a birthday party.
For these reasons, a system for automatically creating the optimal clustering of video shots and/or still images (collectively Media Elements (ME)) for browsing, disc creation or other manipulation is desired. Such a system needs to consider the shots within the digital video file itself, the times associated with each video shot or still image, the means by which the clusters will be presented to the user (e.g. in a menu on an optical disc like DVD-video, or in a browseable library on a computer) and user input in determining how many clusters to create and precisely how to allocate video shots and still images to clusters.
Similar issues exist when a user is dealing with any other collection of objects occurring at known times, for example presentation and word processing documents. As used herein Media Elements refers to video shots, still images, audio clips or any other collection of objects having known start times and optionally a known duration or end-time.