With the ubiquitous presence of digital cameras and camera phones, people capture large numbers of digital still images and digital videos to mark both events that are important to them, as well as day-to-day occurrences that chronicle their lives. Large digital media collections including digital still images and digital videos accumulated over time contain a wealth of information that can be useful to understand users. Analyzing the content and timing of a user's digital media assets in a media collection that spans several years or more can yield a view into the user's interests. This knowledge can enable organization of the personal collection, sharing with contacts, as well as semi-automated storytelling. For example, if evidence from a user's personal photo collection suggests that he/she regularly takes pictures of flowers, the images in this group can be organized appropriately with links to other similar images, and fed as an input to a photo-book generator.
Typical browsing tools provide a temporal view of a media collection, with some that support browsing by tags or faces of recognized people (e.g., Picasa and iPhoto). In a system that supports tags, a user could find interesting groups by specifying a set of tags, but many patterns in a collection are based on a complex set of features, and a few high-level tags can generate only a limited variety of groups. In addition, the number of images in such groups can be too many or too few to be useful.
An automated system could potentially be used to create stories of a pre-defined format (e.g., pictures of one person over time, or images at the same GPS location), but it is not possible to create stories that are customized to a user's interests without attempting to understand the specific user's media collection. For example, in the case where a user primarily captures photographs at his home and is a gardening enthusiast who has a lot of pictures of his flowers, a system that creates a location-based story would detect a large single group at the same location and not be able to identify that the flower pictures in the collection form a distinct group.
There has been work in trying to understand images through object detection, tagging based on similar images on the web, and through the use of content-based features. People are the subject of a large fraction of consumer images, which can be tagged through the use of commercially available packages providing face detection and recognition capability. Captured media may also include GPS information identifying location of capture. So far, research on the use of these types of metadata has been focused on providing better ways of searching and organization.
There has been recent work in grouping images into events. U.S. Pat. No. 6,606,411 to Loui et al., entitled “A method for automatically classifying images into events,” and U.S. Pat. No. 6,351,556, to Loui et al., entitled “A method for automatically comparing content of images for classification into events,” disclose algorithms for clustering image content by temporal events and sub-events. Briefly described, the histogram of time differences between adjacent images or videos is clustered into two classes: time differences that correspond to event boundaries, and those that do not. A color block-based visual similarity algorithm is used to refine the event boundaries. Using this method, a reduction can be made in the amount of browsing required by the user to locate a particular event by viewing representative thumbnail images from the identified events along a timeline, instead of viewing all of the thumbnail images. However, related events with large temporal separation are spaced far apart on the timeline and are not easy to visualize as a group.
In the generally unrelated area of data mining, transactions histories for people (purchases, online activities, social network interactions etc) have been used to derive useful rules about individual and group behaviors. A transaction typically contains a transaction identifier and a set of items belonging to the transaction. This is also called “market basket” style data, from its roots in listing the contents of a supermarket shopping cart of individual shoppers. A transactions database contains a large set of transactions. Standard data mining techniques provide tools for extracting frequently occurring groups of items (itemsets) in transactions databases. There has been some work in using data mining techniques in the image collection domain.
U.S. Patent Application Publication 2011/0072047 to Wang et al. entitled “Interest learning from an image collection for advertising,” focuses on suggesting user-targeted ads based on automatically detecting a user's interest from their images. The techniques described include computer-annotating images with learned tags, performing topic learning to obtain an interest model, and performing advertisement matching and ranking based on the interest model. However, this method uses topic ontology identification based on a large collection of human-identified categories. This can work for the advertisement selection problem addressed, since advertisement descriptors also have a limited vocabulary. But it is not possible to enumerate and detect all types of semantic relationships that may exist in users' collections because of the vast diversity of subject matter of photographs captured. Therefore, semantic themes need to customized to each particular user's collection based on analysis of the content of the collection.
Other work in related areas includes U.S. Pat. No. 6,598,054 to Schuetze et al., entitled “System and method for clustering data objects in a collection,” which describes a system and method for browsing, retrieving, and recommending information from a collection using multi-modal features of the documents in the collection, as well as an analysis of users' prior browsing and retrieval behavior; and U.S. Patent Application Publication 2008/0275861 to Baluja et al., entitled “Inferring user interests,” which describes a method that includes determining, for a portion of users of a social network, label values each comprising an inferred interest level of a user in a subject indicated by a label. Both of these methods are targeted at databases containing primarily textual and structured data.
Since a typical user has already accumulated many years' worth of digital images, finding images that fit a narrative thread by browsing through temporally distant media is difficult and time-consuming. There remains a need for a method to detect groups of images that are semantically related to each other but are temporally separated by long time differences.