With the advent of digital photography, consumers now are capable of easily accumulating a large number of images over their lifetime. These images are often stored in “shoeboxes” (or their electronic equivalent), rarely looked at, occasionally put into albums, but usually laying around, unused and unlooked at for years.
The “shoebox problem” is particularly relevant, because “shoeboxes” are an untapped source for communicating shared memories that are currently lost. After initially viewing pictures (after they are returned from film developing or downloaded to a computer), many people accumulate their images in large informal, archival collections. In the case of hardcopy photos or printouts, these pictures are often accumulated in conveniently-sized shoeboxes or albums. Images in shoeboxes, or their electronic equivalent in folders or removable media, are often never (or very rarely) seen again, because of the difficulty of retrieving specific images, browsing unmanageably large collections and organizing them. Typically, any organizing apart from rough reverse-chronological order involves so much effort on the part of the user that it is usually never performed. Consequently, retrieval is an ad hoc effort usually based on laborious review of many, mostly non-relevant, images.
Potentially, of course, the images could be annotated with text labels and stored in a relational database and retrieved by keyword. However, until computer vision reaches the point where images can be automatically analyzed, most automatic image retrieval will depend on textual keywords manually attached to specific images. But annotating images with keywords is a tedious task, and, with current interfaces, ordinary people cannot reasonably be expected to put in the large amount of upfront effort to annotate all their images in the hopes of facilitating future retrieval. In addition, even if the images can be automatically interpreted, many salient features of images exist only in the user's mind and need to be communicated somehow to the machine in order to index the image. Therefore, retrieval, based on textual annotation of images, will remain important for the foreseeable future.
Furthermore, retrieval applications themselves are awkward enough that they often go unused in cases where the user might indeed find images from the library useful. For instance, the retrieval itself involves dealing with a search engine or other application that itself imposes overhead on the process, even if only the overhead of starting and exiting the application and entering keywords. Because of this overhead, opportunities to use images are often overlooked or ignored.
A primary opportunity for use of consumer picture-taking is in connecting people through pictures and stories they tell about events. Pictures convey emotions in a way that words cannot. For instance, imagine recently attending a wedding, and consider the resulting electronic mail message describing the event that might be sent to a friend. The mail would be greatly enhanced if the story could be illustrated by including pictures of the event, and perhaps also of pictures of related people, places, and events in the past. What is needed to accomplish this? Here is an example of what a person might have to do:                Take pictures at significant events in the wedding: exchanging vows, cutting the cake, the couple kissing, etc. Take pictures at each dinner table, people dancing, conversing, etc.        Get the pictures into the computer. This might involve: Removing the storage medium [memory card, floppy disk] from the camera and inserting it into a reader. Possibly connecting the reader device or the camera with a cable to the computer. Launching the communications software or setting a mode to perform the transfer. Selecting a place on the computer for the pictures to go. Selecting a name for the set of pictures so you don't forget what they are.        Launching an image viewing/manipulation/cataloging program [e.g., Adobe Photoshop™, PicturePage™]. Initially scanning the pictures and removing the “duds”, exposures that were not successful. Possibly changing the file name of an individual picture to describe its contents. If you do have an image database, you may attach keywords to individual images or sets. Possibly performing image manipulation on the picture [cropping, adjusting brightness, etc.] using the same or separate application. Possibly printing hardcopy of images for storage or sending to others. Possibly e-mailing pictures to others or posting on Web pages.        Perhaps weeks or months later, you would like to use the images when composing an e-mail message to a friend or family member about the wedding. In addition to launching and operating the e-mail application itself, you must launch another application, an image viewer/catalog/search application. Perhaps you may search around in the file system to find a folder containing relevant images, either by browsing or retrieval by file name. Perhaps relevant images are stored on your own or acquaintances' Web pages, necessitating launching the Web browser and typing URLs or using search engines. Perhaps you may search the image database via keywords. You switch back and forth between applications as the need arises. If you succeed in finding a picture you cut the picture from the image application and paste it into the e-mail editor.        
Nothing about this interaction is easy, and nothing would make this task easier to do the next time, for example, if you wanted to tell a related story to a different person in the future. One approach to alleviating this problem is by use of an agent, which is a program that performs some information gathering or processing task in the background. Typically, an agent is given a very small and well-defined task. More specifically, two types of agents that are useful in this connection are interface agents, software that actively assists a user in operating an interactive interface, and autonomous agents, software that takes action without user intervention and operates concurrently, either while the user is idle or taking other actions.
Autonomous interface agents have extended the field of natural language communication to the field of memory extension. For example, from the B. J. Rhodes et al. article “Remembrance Agent: A Continuously Running Automated Information Retrieval System” which appeared in the 1996 Proc. of the First International Conference on the Practical Application of Intelligent Agents and Multi Agent Technology (PAAM '96), pp. 487–495, it is known that a remembrance agent can automatically assist a system user by providing a continually running automated information retrieval system for monitoring a user's data entry and, thus, the thought processes of the user. For example, the system provides a remembrance agent which continuously monitors text in a window around the user's typing activity. It periodically performs a match of the text in the window against a database of stored personal documents, such as E-mail archives, based on the frequency of words common to the query text and the reference documents. It then presents at the bottom of the user's screen a ranked list of suggestions for the k most relevant entries to the current activity (k is set by the user). The user may then easily retrieve and view an entry's text. The remembrance agent does not require human preprocessing of the archive. However, the remembrance agent, which is designed to scan stored text entries, does not lend itself to retrieval of image materials and does not facilitate the annotation of such materials.
An autonomous interface agent (named “Letizia”) for web browsing is described in several articles by H. Lieberman, including “Letizia: An Agent that Assists Web Browsing”, which appeared in the International Joint Conference on Artificial Intelligence, Montreal, 1995 and “Autonomous Interface Agents”, which appeared in Proceedings of CHI '97, Atlanta, Ga., March 1997, pp. 67–74. Letizia is a user interface agent that assists a user browsing the World Wide Web. Letizia records the URLs chosen by the user and reads the pages to compile a profile of the user's interests. Consequently, as the user operates a conventional Web browser, the agent tracks user behavior and attempts to anticipate items of interest by doing concurrent, automatic exploration of links from the user's current position. The agent automates a browsing strategy consisting of a best-first search augmented by heuristics inferring user interest from browsing behavior. Letizia then uses the browser's own interface to present its results, using an independent window in which the agent browses pages thought likely to interest the user. However, as with the remembrance agent, Letizia is not designed for the retrieval of image materials and does not facilitate the annotation of such materials.
In the article by J. Budzik and K. Hammond, “Watson: Anticipating and Contextualizing Information Needs”, Proc. Of the Sixty-second Annual Meeting of the American Society for Information Science(1999), Information Today, Inc.: Medford, N.J., an information management assistant (nicknamed “Watson”) detects opportunities for performing special-purpose searching in the context of document composition. For example, when a user inserts a caption with no image to fill it in their document, Watson uses the stop listed words in the caption to form a query to an image search engine. Users then can drag and drop images presented into their documents. However, as with the remembrance agent, there is no effort to facilitate the annotation of such materials.
It has been recognized that more effective information exploration tools could be built by blending cognitive and perceptual constructs. As observed by A. Kuchinsky in the article, “Multimedia Information Exploration”, CHI98 Workshop on Information Exploration, FX Palo Alto Laboratory, Inc.: Palo Alto, Calif. (1998), if narrative and storytelling tools were treated not as standalone but rather embedded within a framework for information annotation and retrieval, such tools could be leveraged as vehicles for eliciting metadata from users. This observation of a potential path forward, however, is still largely divorced from the contextual use of the images in an application like e-mail and does not propose any observational learning from the user.
Despite the aforementioned suggestion to try a different approach, the conventional view remains that annotation and retrieval are two completely separate operations, to be addressed by applications operating independently from each other, and from any application in which the images might be used. This leaves the burden on the user to enter and leave applications when appropriate, and explicitly transfer data from one application to another, usually via cut and paste. Users are inclined to think about their own tasks, as opposed to applications and data transfer. Each user's task, such as sending an e-mail message, carries with it a context, including data being worked with, tools available, goals, etc., which tends to naturally separate from the context of other applications.
Consequently, there is a needed role for a user interface agent in facilitating, rather than fully automating, the textual annotation and retrieval process in connection with typical uses of consumer picture-taking. The role of the agent would lie not so much in automatically performing the annotation and retrieval but in detecting opportunities for annotation and retrieval and alerting the user to those opportunities. The agent should also make it as easy as possible for the user to complete the operations when appropriate. Indeed, there is particular need for a user interface agent that assists users by proactively looking for opportunities for image annotation and image retrieval in the context of the user's everyday work.