Digital imaging has vastly increased people's ability to amass very large numbers of still images, video image sequences, and multimedia records, combining one or more images and other content, for their personal collections. (Still images, video sequences, and multimedia records are referred to collectively herein by the terms “image records” or “images”, as appropriate.)
At the same time, with the pervasiveness of digital media, the use of digital images in computing, especially in human-computer interaction (HCI) for user interfaces and design, as well as in such wide-ranging areas as education, social media, art, science, advertising, marketing and politics, is rapidly becoming more and more significant. All of these applications present challenges to the organization, selection and retrieval of the most appropriate images for any given purpose.
Efforts have been made to aid users in organizing and utilizing image records by assigning metadata to individual image records. Some types of metadata provide an indication of the expected value to the user. For example, many online databases and photo sharing communities allow users to designate images as favorites by selecting appropriate tags and labels, or to assign a rating for photos, such as an image quality rating or an aesthetic appeal rating, or to otherwise express their opinions by writing notes, issuing virtual awards and invitations to special user groups. An online photo-enthusiast community, Flickr, for example, introduced the selection of the most interesting images for any point in time, wherein the “interestingness” is determined by considering several aspects associated with images including “click” statistics, the presence/absence of comments, favorite tags, and who made them. In some applications, “favorite” tags or other comparable tags, (e.g. Facebook's “like” tag) are counted to provide a type of popularity ranking. The DPCchallenge and the Photobucket photo sharing sites encourage users to rate images on overall quality on a scale of 1 to 10 through contests and challenges. By doing so, all these databases allow users to efficiently access the best or most popular images. Many of these photo sharing websites cater to photo-enthusiasts, amateur photographers, or even professional photographers who attempt to capture and create unique and artistically looking images. They often choose unusual subject matter, lighting, and colors or create specific effects by editing their images with various creative and photo editing tools. Other online photo storage and sharing services, such as Kodak Gallery, Shutterfly, and Picasa, are primarily intended to serve consumers who capture and share snapshots of everyday events and special moments with family and friends.
Social networking sites, such as Facebook, enable users to collectively accumulate billions of images as a means of keeping in touch with friends. Users can upload their photos and share them with friends, as well as create prints, photo books and other photo-related items. Similar to online photo sharing communities, these services allow users to selectively mark images as favorites, for example, by using the “like” tag, and to create other tags and annotations. In addition to pictures, users increasingly upload and share video snippets, video files and short movies. YouTube is one of the most prominent examples of a video sharing and publishing service, wherein users can upload video files in the form of videos, short movies and multimedia presentations to share personal experiences, broadcast multimedia information for education purposes, and promote specific services/products. However, compared to the relative abundance of tags and rankings in photos shared by communities of photo-enthusiasts and public and commercial image and multimedia databases, tags and rankings are used considerably less frequently for images of friends and family. This limits their applicability for efficient image organization and retrieval.
To assist users in selecting and finding the best or most suitable images, various methods have been developed. Typically, these methods analyze and evaluate subject matter categories, locations, scene types, faces of people in the photo and their identities, and other image attributes extracted directly from image data or associated metadata for image organization and retrieval purposes. For example, the article “Inferring generic activities and events from image content and bags of geo-tags” (Proc. 2008 International Conference on Content-based Image and Video Retrieval, pp. 37-46, 2008) by Joshi et al. describes a method for classifying an image into a plurality of activity/event scene categories in a probabilistic framework by leveraging image pixels and image metadata.
The article by Yanagawa et al., entitled “Columbia University's baseline detectors for 374 LSCOM semantic visual concepts” (Columbia University ADVENT Technical Report #222-2006-8, 2007) describes an activity/event classification method where image pixel information is analyzed using support vector machine (SVM) based classifiers. These classifiers use image color, texture, and shape information to determine an activity/event classification for an image. In a related method, GPS metadata associated with the images can be leveraged to obtain location specific geo-tags from a geographic database. Subsequently, a bag of words model can be combined with the SVM data to provide an improved activity/event classification.
While the organization and retrieval of images based on image understanding and semantic analysis can be useful in some applications, selection of images based on subjective attributes, such as image quality, user preference, subjective importance, and predicted aesthetic/emotional value is valuable to enable users to quickly access the best and/or most popular images in a collection. For example, U.S. Pat. No. 6,671,405, to Savakis et al., entitled “Method for automatic assessment of emphasis and appeal in consumer images,” discloses a method for automatically computing a metric of “emphasis and appeal” of an image without user intervention. A first metric is based upon a number of factors, which can include: image semantic content (e.g., detected people, faces); objective features (e.g., colorfulness, sharpness, overall image quality); and main subject features (e.g., main subject size). A second metric compares the factors relative to other images in a collection. The factors are integrated using a trained reasoning engine. U.S. Patent Application Publication 2004/0075743, to Chatani, entitled “System and method for digital image selection,” uses a similar method to perform image sorting based upon user selected parameters of semantic content or objective features in the images.
Commonly-assigned U.S. Patent Application Publication 2003/0128389, to Matraszek et al., entitled “Method for creating and using affective information in a digital imaging system cross reference to related applications,” discloses another approach that provides a measure of image record importance (i.e., “affective information”), which can take the form of a multi-valued metadata tag. The affective information can be manually entered by a user. It can also be automatically detected by monitoring user reactions (e.g., facial expressions or physiological responses), or user initiated utilization of a particular image (e.g., how many times an image was printed or sent to others via e-mail). The resulting affective information can be stored as metadata associated with a particular user. The use of affective metadata is generally limited in that it requires exposure and accumulation of tags with respect to already viewed images and does not directly translate to novel, unseen, or untagged image content.
Commonly-assigned U.S. Pat. No. 7,271,809 to Fedorovskaya et al., entitled “Method for using viewing time to determine affective information in an imaging system,” discloses a method for providing image metadata based on image viewing time. With this approach, the time intervals during which the user chooses to view each of the still digital images on the electronic displays are electronically monitored and used to determine the degree of interest for each image. Subsequently, the metadata can be be used to assist in retrieving one or more images.
Commonly-assigned U.S. Pat. No. 8,135,684, to Fedorovskaya et al., entitled “Value index from incomplete data,” describes another method that includes combining data about an image from multiple sources. The data that is combined includes capture related data, intrinsic image data (e.g., image quality data and image content data) and image usage data, and is used to generate value indices for the images, which can then be used to manage image sets.
Considering the very large numbers of image records, the rapid expansion of social networks and shared social media, and the increasing range of applications, there is a growing need for new and improved image and multimedia selection methods. These new methods should take into consideration how users will respond to the selected content, even if it is novel and untagged. Preferably, the methods should determine whether a user will find an image interesting, and worthy of their attention. In this regard, research in psychology, neuroscience, communication and advertising is providing useful information with respect to the nature of people's preferences, interests and reactions to objects and situations, including complex imagery, and to the underlying perceptual and cognitive processing. This information can be utilized in developing algorithms and methods for rating and selecting images and multimedia content suitable for personal usage, as well as for visual communication, persuasion, advertising and other uses.
Photographs are not mere artifacts, but represent semiotic systems, from which viewers derive meaning. As discussed by Scott in the article “Images in Advertising: The Need for a Theory of Visual Rhetoric” (Journal of Consumer Research, Vol. 21, pp. 252-273, 1994), people draw on accumulated past experiences in order to make sense of photographs. Although they may be initially attracted to an image because of its quality, aesthetic properties, or low-level features, it has been found that viewers subsequently determine what is worthy of longer study based on the potential that they see in the image of generating deeper meaning.
It has been found that there is a link between what people find interesting and their familiarity with respect to the communicated information. Unlike “recollection,” which entails consciously “remembering” an item, familiarity spurs a form of associative recognition and has been explained as arising when “fluent processing of an item is attributed to past experience with that item” (see: Yonelinas, “The Nature of Recollection and Familiarity: A Review of 30 Years of Research.” Journal of Memory and Language, Vol. 46, pp. 441-517, 2002). Familiarity has been defined and measured in two ways. Familiarity with an item's meaning involves the amount of perceived knowledge a person has about an item or its meaningfulness to that person. Familiarity with regards to frequency of exposure is measured by how often a person encounters the item.
The concept of “interestingness” (or equivalently “interest level”) has been the subject of multiple interpretations. Interestingness has been interpreted as the attribute of an item, as the response of a user to an item, as an emotion, or simply as a psychological or behavioral reaction. Vaiapury et al., in the article “Finding Interesting Images in Albums using Attention” (Journal of Multimedia, Vol. 3, pp. 2-13, 2008), specify interestingness as “an entity that arises from interpretation and experience, surprise, beauty, aesthetics and desirability”, a process based on “how one interprets the world and one's accumulation of experience as embodied in the human cognition system”.
Interestingness has also been commonly equated to attention. For example, Katti et al., in the article “Pre-attentive Discrimination of Interestingness in Images” (2008 IEEE International Conference on Multimedia and Expo, pp. 1433-1436, 2008), describe interestingness as “an aesthetic property that arouses curiosity and is a precursor to attention.”
Interest level has been put forward not only as a reaction of the cognitive system to stimulus, but has also been studied as an emotion (for example, see: Silvia, “What Is Interesting? Exploring the Appraisal Structure of Interest” (Emotion, Vol. 5, No. 1, pp. 89-102, 2005). Apart from the variables of novelty, complexity and surprise, “personal connection” and “thought-provoking” have been identified as attributes that contribute to the interestingness of pictures (for example, see: Halonen et al., “Naturalness and interestingness of test images for visual quality evaluation,” Proc. SPIE 7867, 78670Z, 2011).
There remains a need for incorporating measures of familiarity into methods for evaluating the interest level of images or multimedia items order to improve ways of selecting information that can personally appeal to the viewers and users of various multimedia collections, online communities, social networks and databases.