Digital photographers capture digital content with digital still cameras, video cameras, camera phones, and other random access digital capture devices. The captured content is initially stored on the capture device and, commonly, is then moved to personal computer disk memory or online storage systems. Whether the images are stored on the device or on larger computer systems, the photographer can either manually or automatically organize their images in a hierarchical fashion into digital content containers (typically called albums or folders). These containers can contain images and other image containers creating a hierarchical storage scheme. Organizing digital content by the real-life events, such as birthdays, holiday parties, and the like, is one of the most common organization methods used by digital photographers.
When searching and browsing hierarchical digital content collections, digital capture devices, personal computer file systems, image organization applications, and online storage systems typically represent a collection of digital content with an icon and/or a small-scaled image from the collection usually called a “thumbnail”. The thumbnail image gives the user a view of one image from a potentially large collection of images to assist them in recognizing the event and content of the collection and is advantageous for this purpose over an icon and collection name. The specific image selected from the collection to represent the collection is sometimes referred to as a “key” image. The same approach is utilized to provide representative images, also referred to as “key frames”, of video sequences.
The term “image record” is used herein to refer to a digital still image, video sequence, or multimedia record. An image record is inclusive of one or more digital images and can also include metadata, such as sounds or textual annotations. A particular image record can be a single digital file or multiple, but associated digital files. Metadata can be stored in the same image file as the associated digital image or can be stored separately. Examples of image records include multiple spectrum images, scannerless range images, digital album pages, and multimedia video presentations. With a video sequence, the sequence of images is a single image record. Each of the images in a sequence can alternatively be treated as a separate image record. Discussion herein is generally directed to image records that are captured using a digital camera. Image records can also be captured using other capture devices and by using photographic film or other means and then digitizing. As discussed herein, image records are stored digitally along with associated information.
Consumers tend to capture image records episodically, reflecting different occurrences. Grouping image records to reflect different episodes can be based on spatio-temporal differences, such as time or distance, or, in a more sophisticated manner, based upon event and subevent. These approaches tends to be convenient for many people, but have the shortcoming that the subject matter of the image records in groups and subgroups is not necessarily apparent from the grouping procedure. A representative image can make the subject matter of a group or subgroup apparent, but is relatively difficult to determine. This is unlike groupings, in which all members necessarily have the same subject matter, such that a member of a group is representative of the group. For example, any member of the group “pictures of the new baby” would be capable of representing the group as a picture of the new baby.
Many systems use the first image record in a set of image records as the representative image record. The first image record is typically chronologically the earliest. The selection of the first image record often does not adequately reflect the context or content of the other image records in the set. For example, many digital photographers capture content before an actual event begins simply to verify camera operation. The content captured for this purpose is arbitrary and may or may not reflect the content captured during the actual event. Actual event content captured at the beginning of a lengthy event also frequently does not accurately reflect the context and setting of the entire event.
U.S. Pat. No. 6,847,733 to Savakis et al. discloses a method, in which images are grouped and representative images of the groups are determined based upon an understanding of the content (semantic saliency features) in the images. U.S. Pat. No. 6,721,454 to Qian et al. discloses a method, in which video sequences are analyzed to determine semantic and other saliency features that are then summarized in textual descriptions. These approaches can be effective, but tend to be uniformly computationally intensive and complex.
It would thus be desirable to provide an improved method and system, which overcomes these shortcomings.