The present application is directed to information retrieval, and more particularly to interactive systems for foraging and retrieval of documents and information.
The term retrieval used in association with information and/or document retrieval is commonly understood to be related to a system for locating or making accessible or known, information, documents, or citation data for documents.
The term foraging when used in association with information and/or document foraging is intended to emphasize the process by which a person interacts with an information system—searching, browsing, sampling, reading, and so on—when working to identify and acquire information.
These terms have overlapping concepts, and effective information systems should support both foraging and retrieval. Although retrieval is more concerned with the actual obtaining of documents or information, and foraging is more concerned with the process by which the searching for the documents or information occurs, since they do contain overlapping concepts, at times in the following discussion these terms are used to refer to related activities.
Many types of information retrieval systems are in existence. Often they take the form of search engines configured to search across numerous document systems and/or databases connected together via an electronic data communication network such as the internet, or local and/or privately controlled networks.
Existing information retrieval systems (e.g. search engines) commonly undertake the document retrieval task via what may be expressed by a linguist as a “bag of words” approach. Under this analogy, a document or web page is broken into individual words, and these words are then placed into a bag for purposes of generating matches. A user specifies a search request as a query made up of search terms. These search terms typically include several individual words. The search engine will undertake a search of the document database and will identify all of the documents whose “bags” contain these search terms. The search engine then returns a collection of text snippets from the matching documents, and ranks the documents in accordance with a relevancy determination. Individual search engines may identify or rank sources of information in accordance with differing standards, influencing the returned documents and their ranking. For example, a search engine may be designed to give precedence to those snippets whose documents include all of the search terms. Other designs may give presence to the most popular or authoritative documents based on an analysis of how frequently the documents are cited by the links to other documents. As a refinement a search engine may permit the selected search terms to be connected via boolean-type operators, a distance and/or order between the words.
The retrieval of information is commonly considered to involve two distinct fields.
The first field is information retrieval, which addresses the case where the information is organized in terms of documents. Examples include searches within the World Wide Web and/or digital libraries. In this field, there is an assumption that interesting information is stored as content of documents and is represented in terms of language, graphics or pictures. Automated natural language processing or human reading is used to classify documents or to extract information from the documents. In this field, documents are retrieved in several ways, such as by key word matching of contents, browsing and selection from manually constructed hierarchies, or matching of meta-data such as authors and publication dates. Thus, retrieval of documents in this field of pursuit, focuses on identifying and retrieving relevant documents.
The second field is related to database management, and addresses a situation where the information is organized in terms of tables or databases. A well-known approach to organizing information in databases is through the use of relational database structures. This field assumes interesting information is stored in databases, and is represented formally and consistently in terms of numbers and symbols in fields of records organized typically in tables. Information may be retrieved by specifying values for some of the fields of records and then returning the records whose fields match. Retrieval in this area, therefore, focuses on fetching and combining data from the records.
One particular approach of document retrieval, is the Scatter/Gather approach, Peter Pirolli, Patricia Schank, Marti Hearst and Christine Diehl, 1996, Scatter/Gather Browsing Communicates the Topic Structure of a Very Large Text Collection, Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 96), 213-220, New York: ACM Press., which uses document clustering to automatically scatter a document collection into a smaller number of coherent document groups. It then presents short summaries to the user. Using the summaries, the user selects one or more of the groups for further study. The system then gathers these groups together, by a union operation, to form a sub-collection. The clustering operation is again used to scatter the sub-collection into a new set of document groups, which are then presented to the user. The groups become smaller with each iteration.
A variant of Scatter/Gather is Multi-Modal Scatter/Gather (MMSG), Francine Chen, Ullas Gargi, Les Niles, and Hinrich Sch tze, 1999, Multi-Modal Browsing of Images in Web Documents, Proceedings of the SPIE Conference on Document Recognition and Retrieval. This process has primarily been used for browsing of images. MMSG extends Scatter/Gather in its use of features in different spaces, that is, different modalities. In MMSG, users browse a collection based on iteratively specifying a feature, which is then used for clustering to form partitions. An expand operation may add images or clusters to a current set, based on similarity in one feature dimension.
A further approach is known as a SenseMaker system, Michelle Q. Wang Baldonado and Terry Winograd, 1997, SenseMaker: An Information-Exploration Interface Supporting the Contextual Evolution of a User's Interests, Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 97), 11-18, New York: ACM Press, which supports users in information exploration and interactively organizing document collections around contextual dimensions. SenseMaker provides a single interface for viewing heterogeneous collections for multiple sources. It enables users to view a collection from a variety of perspectives, such as according to author or topical units. It provides two hybrid strategies for assisting the information seeker: structure-based searching and structure-based filtering. In structure-based searching, SenseMaker users extend document collections by formulating queries that characterize their content and then add documents from the additional sources that match those queries. Structure-based filtering allows a user to limit a collection to sets that a user has selected according to certain properties.
Yet another example of document retrieval is known as the Presto system Paul Dourish, W. Keith Edwards, Anthony LaMarca and Michael Salisbury, 1999, Presto: An Experimental Architecture for Fluid Interactive Document Spaces, ACM Transactions on Computer-Human Interaction, 6(2). Presto is intended to provide an alternative to hierarchies of folders (or directories) for organizing collections of documents. It is implemented for e-mail collections and document collections. A concept of this system is that documents could be assigned properties and retrieved according to property values. For example, author could be a property, and Joe Smith would be a value for the author property. The values associated with the properties are computed by document services. Presto also provides a means for assigning properties to collections. The Presto process further provides a means for specifying documents to be included in a collection according to three components: a query, an inclusion list and an exclusion list. The query defines which documents are wanted according to values of properties. In other words, all documents matching the query at any given moment are members of the collection. The inclusion and exclusion list then serve to modify the query. In the user interface, documents are added to the inclusion list by dragging them to the collection, and are removed by dragging them out of the collection.
A further system is called a Dynamic Query system Ben Shneiderman, 1994, Dynamic Queries for Visual Information Seeking, IEEE Software, 111(6), pp. 70-77, which is directed to manipulation of interfaces to databases. Typically, there is a single display of data and a set of controls, such as sliders, which function as query controls that can be manipulated to determine the selection of data to be retrieved. A goal of a Dynamic Query system is defined as, giving a visual presentation of a query's components, provide a visual presentation of results, provide rapid, incremental and reversible control of a query, select information of interest by pointing rather than typing and provide immediate and continuous feedback to a user about the results of changes to the query.
It is considered each of the mentioned and other existing processes and/or systems do not describe a system and/or device which permits sufficient robust conversational capability within the exploratory information retrieval process.