The present invention relates to content retrieval and more particularly to a system for managing access to content.
Researchers have pursued a variety of approaches to integrating natural language processing with document retrieval systems. The central idea in the prior art literature is that some, perhaps shallow variant of the kind of syntactic and semantic analysis performed by general-purpose natural language processing systems can provide information useful for improving the indexing, and thus the retrieval, of documents.
For more information regarding such research, additional reference may be made to the following documents:
Marti Hearst. 1992. Direction-Based Text Interpretation as an Information Access Refinement. In [Jacobs1992] (see below);
David Lewis. 1992. Text Representation for Intelligent Text Retrieval: A Classification-Oriented View. In [Jacobs1992] (see below);
Karen Sparck Jones. 1992. Assumptions and Issues in Text-Based Retrieval. In [Jacobs1992] (see below);
Paul Jacobs (ed.) 1992. Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Lawrence Erlbaum Associates, Hillsdale, N.J. New Jersey; and
Christos Faloutsos and Douglas Oard. 1996. A Survey of Document retrieval and Filtering Methods. Technical Report, Information Filtering Project, University of Maryland, College Park, Md.
The general goal of a document retrieval system is to consult a large database of documents and return a subset of documents ordered by decreasing likelihood of being relevant to a particular topic. In a routing task, a document retrieval system returns a number of documents it judges most likely to be relevant to a query out of a database of a vast number of documents. A system performs well if a high proportion of the articles returned, high relative to the ratio of relevant articles in the corpus, are relevant to the topic, and if the relevant articles are ranked earlier in its ordering than the irrelevant ones. For more information regarding a typical document retrieval system, reference may be made to Donna Harman. 1996. Overview of the Fourth Text Retrieval Conference (TREC-4). In Proceedings of TREC-4.
The goal of an information extraction system, on the other hand, is to consult a corpus of documents, usually smaller than those involved in document retrieval tasks, and extract pre-specified items of information. Such a task might be defined, for instance, by specifying a template schema instances of which are to be filled automatically on the basis of a linguistic analysis of the texts in the corpus. For more information regarding a typical information extraction system, reference may be made to Ralph Grishman and Beth Sundheim. 1995. Design of the MUC-6 Evaluation. In Proceedings of the 6th Message Understanding Conference, ARPA, Columbia, Md.
Work in the areas of document retrieval and information extraction has seen some success in their separate, distinct domains. However, successful integration of the two to create an information indexing and retrieval system has yet to be demonstrated. There is therefore a need for improving such document/content retrieval and information extraction technology.
A system, method and computer program product provide a content exchange system. A natural language request (i.e., query) is received from a user utilizing a local system. A determination is made as to whether the user request can be fulfilled from information stored by the local system. The request is fulfilled from a local data source if the request can be fulfilled locally with information of the local system. If the request cannot be fulfilled locally, the request is fulfilled at a network site. A content directory connected to the network site is examined for selecting one or more network data sites having content potentially satisfying the request. The request is sent to the data site(s), which may be local or remote to the network site and can include websites, databases, etc. Content pertaining to the request is received from the data site(s) and sent to the user. As an option, details of the request, ultimate data sources, and intermediate processing can be logged for collecting a fee.
The present invention provides several methods that can be used to determine whether a query be handled locally. According to one method, the network site determines whether the user request can be fulfilled from information stored by the local system. According to another method, the local system sends content for fulfilling the request to the network site, where the results are compared and, optionally, ranked. The network site can also be used to determine whether the user request can be fulfilled from information stored by the local system.
In a preferred embodiment of the present invention, the content directory includes term frequency data, where the request is compared to the term frequency data for selecting the data site or site. Items of the content can be ranked according to relevance to the request.
In an embodiment of the present invention, the request is parsed for determining a meaning of the request. The determined meaning is used during examination of the content directory. In another embodiment of the present invention, a request is made for clarification information from the user. Such information is used to limit the responses.
Additional content can be pushed to the user. Such content can be selected based on user activity including the request. An example of such content is advertising. A cookie can be generated. Such a cookie can be used to record user preferences or avoid duplication of advertising.
The content for fulfilling the request can be filtered based on a transaction history of the user. Preferably, the user""s requests and/or content selections (including the responses selected) are monitored to generate the transaction history.