1. Field of the Invention
The present invention is related to automated information retrieval, and more particularly to systems, methods, and computer program products for asynchronously retrieving relevant information from a number of sources and presenting the retrieved information to an end user in a manner that avoids the need for conscious effort by the end user.
2. Description of the Background Art
Modern office appliances, including computers, photocopiers, digital cameras, meeting recorders, personal digital assistants, visitor kiosks, printers, and the like, are capturing increasing amounts of digital information. This information includes, for example, office communications (such as e-mails, voicemails, and faxes) and other corporate knowledge (documents, presentations, visitor records, meetings, reports, spreadsheets, videos, and the like). Such information is often stored in a distributed fashion among many different devices and at many different locations. In addition, related information on various topics may be available from other sources such as the World Wide Web, publicly available databases, and the like. In general, some subset of such information is available to users via conscious retrieval methods, such as by browsing file structures and hyperlinks, navigating through file systems, searching by keyword, scrolling, and the like.
Conscious retrieval methods suffer from several limitations. The first is information overload: the sheer volume of digital data available makes it difficult for users to locate a particular desired piece of information. In many instances, information available on the World Wide Web may be particularly difficult to locate because of the unstructured and open-ended nature of the medium.
A second limitation is source overload: the large number of information sources often requires a user to search in several different places, often using different search mechanisms which must each be performed consciously. For example, information concerning a particular person may be available from a variety of sources, including an address book or contact list, a directory available on the World Wide Web, a company database, and the like. Conventionally, conscious searches would have to be performed on each of these information sources separately, and in many cases such searches would have to be formulated in different ways according to the particular characteristics of each of the information sources.
A third limitation is a lack of awareness of available information: a user may simply be unaware that a piece of relevant and useful information is available. This problem is particularly evident when large amounts of information are available in a distributed format or without a central organization or collection scheme. For example, information may be available on a relatively obscure website, or on a remotely located information appliance of which the end user is unaware. The information may have been collected by another user and stored on the other user's machine rather than in public data storage. Or the information may have been collected by an information appliance, such as a photocopier, that automatically retains copies of digital information, and the end user may not be aware that such retention has taken place with respect to a relevant piece of data. Finally, the end user may simply have forgotten that a piece of information exists, even though he or she may have previously been aware of its existence.
A fourth limitation is the overhead associated with retrieving information consciously: the user must often change contexts in order to initiate a search, and furthermore must spend some time formulating searches (as well as acquiring the expertise to formulate an effective search). Thus, conscious retrieval often presents significant barriers, which consume valuable time and which may engender cognitive interruptions that limit the user's overall productivity.
One example of the type of information whose retrieval is subject to the above limitations is information about people. In a typical office environment, contact information and other descriptive information about people are often stored in many different locations. Such information may be stored in information appliances (which may include records of telephone calls, e-mails, records of meetings, and the like), contact lists, databases, and the like. Since the information is highly distributed among several storage facilities, the above-described limitations are particularly pertinent.
Existing techniques of automatic retrieval rely primarily on text-matching algorithms to determine relevancy, along with some knowledge of user actions. Some existing systems employ pattern matching.
Remembrance Agent, developed at Massachusetts Institute of Technology (MIT) Media Lab, uses the content of a document to recommend related files on the user's file system. The user's context, including location and activity, may also be taken into account. The system presents a list of documents related to the user's current document. The list is continually updated as the user inputs text, navigates through e-mails, or otherwise changes the on-screen view.
Margin Notes, also developed at MIT Media Lab, uses the content of a web page being viewed in a web browser to recommend related files on a user's file system. The system compares sections of the web page to pre-indexed document stores, based on keyword co-occurrence. Relevant documents are presented to the end user via margin annotations adjacent to the appropriate section of the web page.
Remembrance Agent and Margin Notes are both further described in B. J. Rhodes & P. Maes, “Just-In-Time Information Retrieval Agents,” in IBM Systems Journal, vol. 39, nos. 3 & 4, pp. 685-704 (2000), and B. J. Rhodes, “Just-In-Time Information Retrieval,” (Ph.D. dissertation, Massachusetts Institute of Technology, 2000). Further description of Remembrance Agent is provided in U.S. Pat. No. 6,236,768 to Rhodes et al., “Method and Apparatus for Automated, Context-Dependent Retrieval of Information,” issued on May 22, 2001.
Watson, developed at Northwestern University Infolab, and described at http://dent.infolab.nwu.edu/infolab/projects/project.asp?ID=5, directs queries to external search engines based on the content of a document being composed or viewed by a user, together with a model of user actions. Watson profiles the user, monitors behavior, and searches for relevant information.
Simple User Interest Tracker (SUITOR), developed at the IBM Almaden Research Center, and described at www.almaden.ibm.com/cs/blueeyes/suitor.html, uses the content of active documents together with a gaze-tracking system to suggest relevant documents from personal and company-wide repositories. Suitor monitors the user's activities, infers what sorts of information that will likely be most interesting at a given moment, and then delivers that information to the user. For example, by monitoring the user's web browsing activity, Suitor can find additional information on topics related to the currently viewed page.
Kenjin, available from Autonomy Systems Ltd. of San Francisco, Calif., and described at www.autonomy.com, automatically delivers links to related information relevant to a document or web page currently open in the user's browser, e-mail client, or application.
Yogi Internet Discovery System, available from PurpleYogi, Inc. of Mountain View, Calif., and described at www.purpleyogi.com, suggests relevant materials from an indexed selection, using personal profiles and a topic classification system.
Active Knowledge, available from Autonomy Systems Ltd. of San Francisco, Calif., and described at www.autonomy.com, uses text pattern recognition software to categorize documents in distributed locations and to dynamically add hyperlinks.
Flyswat, available from Flyswat of San Francisco, Calif., and described at www.flyswat.com, automatically highlights words and phrases within web pages being viewed by a user. Users can click a highlighted item to see a window containing a list of links to additional information about the item.
RichLink, available from Sentius Corporation of Palo Alto, Calif. and described at www.sentius.com/RichLink/english/index.html, automatically adds contextual content to web pages. The content is presented to a user upon the user's request. Third parties can install the RichLink software and provide databases to be used for retrieval of contextual content.
Although the above-referenced prior art systems provide various types of automated information retrieval, they are, in general, only able to retrieve and provide relevant information in a synchronous, real-time mode. Queries are formulated and executed on databases or other storage mechanisms that are available at the time the user is viewing the related document; thus such schemes are generally incapable of retrieving related information that may not be available at the moment the user would find it useful or at the time a search is run. In addition, such prior art schemes are generally unable to obtain related data from other user's computers or from a network of information appliances, but rather are limited to information retrieval from servers or other centrally located sources.
What is needed is a system and method for retrieving and presenting relevant information asynchronously, automatically, and in the context of an end user's activities, so as to avoid the limitations and burdens associated with conscious retrieval. What is further needed is a system and method that performs the retrieval and presentation operations while avoiding the limitations of the prior art. What is further needed is a system and method of automatically retrieving and presenting relevant information to an end user with a minimum of user effort. What is further needed is a system and method of automatically and asynchronously retrieving and presenting relevant information that is stored on other users' computers.