Digital imaging has vastly increased people's ability to amass very large numbers of still images, video image sequences, and multimedia records, and for combining one or more images and other content, for their personal collections. (Still images, video sequences, and multimedia records are referred to collectively herein with the term “image records”, or “images” as appropriate.)
Efforts have been made to aid users in organizing and utilizing image records by assigning metadata to individual image records that indicate a metric of expected value to the user. For example, many online databases and photo sharing communities allow users to designate images as favorites by selecting tags and labels, or to assign a rating for photos, such as image quality or aesthetics, or otherwise express their opinions by writing notes, issuing virtual awards and invitations to special user groups. An online photo-enthusiast community, Flickr, for example, introduced selection of most interesting images for any point in time, wherein the “interestingness” is determined by considering several aspects associated with images including clicks (e.g. number, authorship), presence or absence of comments, favorite tags, and who made them. Often, a favorite tag or other comparable tags, (e.g. Facebook's “like” tag) are counted to provide a sort of popularity ranking. Sites such as the DCPchallenge photosharing site or, similarly, Photobucket, encourage users to rate images on overall quality on a scale of 1 to 10 through contests and challenges. By doing so, all these databases allow users to efficiently access the best or most popular images. These communities consist of photo-enthusiasts, amateur, or even professional photographers who attempt to capture and create unique and artistic images. They often choose unusual subject matter, lighting, colors, or create specific effects by editing their images with various creative and photo editing tools.
Several online photo storage and sharing services, such as Kodak Gallery, Shutterfly, or Picasa, are primarily serving consumers, who capture and share snapshots of everyday events and special moments with family and friends. Social media sites, such as Facebook, are also increasingly accumulating millions of consumer images as a means of keeping in touch with friends. Users can upload their photos and share them with friends, as well as create prints, photo-books and other photo-related items. Similarly to online photo sharing communities, these services allow users to selectively mark images as favorites, for example, by using the “Like” tag, and create other tags and annotations. In addition to pictures, users increasingly upload and share video snippets, video files and short movies. YouTube is one of the most prominent examples of a video sharing and publishing service, wherein users can upload video files in the form of videos, short movies or commercials to share personal experiences, broadcast multimedia information for education purposes, or promote specific services and products. However, compared to the communities of photo-enthusiasts and public and commercial image and multimedia databases, tags and rankings are used considerably less frequently for images of friends and family, thereby limiting their applicability for efficient image organization and retrieval.
To assist users in selecting and finding the best or most suitable images on demand, various algorithms and methods have been developed. These methods analyze and evaluate subject matter categories, location, scene types, faces of people in the photo and their identities, other image attributes for image organization and retrieval purposes extracted directly from image data or associated metadata. For example, the published article of D. Joshi, and J. Luo, “Inferring Generic Activities and Events using Visual Content and Bags of Geo-tags”, Proceedings of Conference on Image and Video Retrieval, 2008 provides a method for classifying an image into a plurality of activity/event scene categories in a probabilistic framework leveraging image pixels and image meta-data. The image pixel information is analyzed using the state-of-the-art support vector machine (SVM)-based event/activity scene classifiers described in the published article of A. Yanagawa, S. F. Chang, L. Kennedy, and W. Hsu, “Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts”, Columbia University ADVENT Technical Report #222-2006-8, 2007. These classifiers use image color, texture, and shape information for activity/event classification.
The metadata information in the form of GPS data available with pictures is leveraged to obtain location specific geo-tags from a geographic database. Subsequently, a bag-of words model is used for detecting activity/event scenes in pictures, and combined with the SVM data to provide a final classification.
While organization and retrieval of images based on image understanding and semantic analysis are very useful, selection based on subjective attributes, image quality, preference, subjective importance, predicted aesthetic and emotional value allows users to quickly access the best or most popular images in the collection.
For example, U.S. Pat. No. 6,671,405 to Savakis et al, discloses a method for automatically computing a metric of “emphasis and appeal” of an image without user intervention. A first metric is based upon a number of factors, which can include: image semantic content (e.g. people, faces); objective features, such as colorfulness and sharpness; and main subject features, such as size of the main subject. A second metric compares the factors relative to other images in a collection. The factors are integrated using a trained reasoning engine. U.S. Patent Publication No. 2004/0075743 is somewhat similar and discloses image sorting of images based upon user selected parameters of semantic content or objective features in the images.
U.S. Patent Publication No. 2003/0128389 A1, filed by Matraszek et al., discloses another approach by providing a measure of image record importance, “affective information” that can take the form of a multi-valued metadata tag. The affective information can be a manual entry or can be automatically detected user reactions, e.g. facial expressions or physiological responses, or user initiated utilization of a particular image, such as how many times an image was printed or sent to others via e-mail. In these cases, affective information is identified with a particular user.
A method for providing image metadata using viewing time is disclosed in U.S. Pat. No. 7,271,809 B2 by Fedorovskaya et al. In this disclosure, the time intervals during which the user chooses to view each of the still digital images on the electronic displays are being electronically monitored, and used to determine the degree of interest for each of the stored images. Subsequently, the metadata can be stored in each respective digital image file and can be used to assist in retrieving one or more still digital images.
Another method, described in U.S. Pat. No. 8,135,684 B2 by Fedorovskaya et al., discloses combining data from multiple sources with respect to images, including capture-related data, intrinsic image data, image-quality data, image-content data, and image-usage data, to generate a value index for the images, and then managing the image sets using thresholded-value indices.
While the above approaches of rating, ranking, and tagging images are useful, they are predominantly oriented toward selecting favorite, high-quality images for personal use. In some cases, the content (or subject matter) of images can be specified by the user, and selection and retrieval often rely on availability of tagging and annotation. Even if these methods employ user reactions that were previously tagged, they do not take into account behavior, associations, habits and preferences of the users formed in their everyday lives that affect how people will react to photographs of different content and appearance. Affective metadata tagging is also limited in that it requires exposure and accumulation of tags with respect to already viewed images and does not directly translate to novel, unseen, or untagged content. At the same time, ranking and tagging of publicly available multimedia entries in online databases and communities by themselves do not allow selection of material personalized according to individual preferences, interests and needs, but rather produces an account of items popularity on average.
With very large numbers of image records, rapid expansion of social networks and shared social media, as well as with an increasing range of applications, there is a growing need for new and improved image and multimedia selection methods that take into consideration how users will respond to the selected content, even if it is novel and untagged, and specifically whether individual users will find it interesting and worthy of their attention.
In this regard, research in psychology, neuroscience, communication and advertising is providing useful information with respect to the nature of people's preferences, interests and reactions to objects and situations, including complex imagery, and underlying perceptual and cognitive processing. This information can be used in developing algorithms and methods for rating and selecting images and multimedia content suitable for personal usage, as well as for visual communication, persuasion, advertising and other uses.
Photographs are not mere artifacts but represent semiotic systems from which viewers derive meaning. In doing so, people draw on accumulated past experiences to make sense of photographs according to Scott, “Images in Advertising: The Need for a Theory of Visual Rhetoric”, The Journal of Consumer Research, Vol. 21, No. 2 (September, 1994), pp. 252-273). They may thus be attracted to an image at first glance because of its quality, aesthetic properties or low-level features, but viewers subsequently determine what is worthy of further study based on the potential that they see in the image generating deeper meaning.
Previous research has shown that verbal communication on familiar topics or persons was thought of as more interesting than verbal communication on unfamiliar ones, indicating that the inherent interestingness of a communication is directly related to its degree of perceived informativeness, wherein they are dependent both upon the “possibility of getting to know something new about something already sufficiently well known” (Teigen, K., “The novel and the familiar: Sources of interest in verbal information”, Current Psychology, 1985. 4(3): p. 224-238.). This and other work highlight the link between what people find interesting and their familiarity with respect to the communicated information.
In contrast to ‘recollections’ that entail consciously ‘remembering’ an item, familiarity spurs a form of associative recognition and has been explained as arising when “fluent processing of an item is attributed to past experience with that item” (Yonelinas, A., “The Nature of Recollection and Familiarity: A Review of 30 Years of Research”, Journal of Memory and Language, 2002. 46(3): p. 441-517). Familiarity has been defined and measured in two ways: familiarity with an item's meaning, involving the amount of perceived knowledge one has about an item or its meaningfulness to the person, and familiarity with regards to frequency of exposure, i.e. the frequency with which one encounters an item.
The concept of ‘interestingness’ has been the subject of multiple interpretations. Interestingness has been interpreted as the attribute of an item, as the response of a user to an item, as an emotion, or simply as a psychological or behavioral reaction Vaiapury and Kankanhalli, in “Finding Interesting Images in Albums using Attention”, Journal of Multimedia, 2008: p. 2-13, for instance specify interestingness as “an entity that arises from interpretation and experience, surprise, beauty, aesthetics and desirability”, a process based on “how one interprets the world and one's accumulation of experience as embodied in the human cognition system”. Interestingness has also been routinely equated to attention. Katti et al. in “Pre-attentive discrimination of interestingness in images. in Multimedia and Expo”, 2008 IEEE International Conference, 2008, Hannover, Germany, qualified interestingness as “an aesthetic property that arouses curiosity and is a precursor to attention”.
Interest has been put forward not only as a reaction of the cognitive system to stimulus, but has also been studied as an emotion. Apart from the variables of novelty, complexity and surprise, subjects in Halonen, R. S. Westman, and P. Oittinen. “Naturalness and interestingness of test images for visual quality evaluation”, in Image Quality and System Performance VIII. SPIE, 2011, IEEE, also identified ‘personal connection’ and ‘thought-provoking’ as attributes that contribute to the interestingness of pictures.
As digital media becomes ever-more pervasive, the role of digital images in computing, especially in human-computer interaction (HCI) for user interfaces and design, as well as in such wide-ranging areas as education, social media, art, science, advertising, marketing, and politics, is rapidly becoming more significant. At the same time, the amount of communication between individuals and organizations is increasing rapidly and it is increasingly important that such communications meet the needs of the recipients; otherwise the recipient might ignore the communications or respond in undesired ways. Moreover, commercial communications, such as advertising, are increasingly targeted to ever-smaller groups and commercial organizations have an increased need to communicate clearly and persuasively to the smaller groups and even to specific individuals.
There is a need therefore, for an improved automated method for communicating with individuals that increases the individuals' interest in the communication and the likelihood of a desirable response to the communication.