1. Field of Invention
This invention relates to information retrieval systems. More particularly, the invention relates to optimized source selection for query searching in a distributed information system, e.g., the Internet.
2. Description of the Prior Art
As the Internet has expanded in public interest, the number of available sources or databases has increased to respond to such interest. The amount of data available to users has increased dramatically. The result is an information overload for the user. Because of such information overload, a query or request for information from the Internet could be very time consuming if all the sources were searched for response to the query. Moreover, the information processing resources necessary to support such queries would be prohibitive and the elapsed time to complete a query would be so extensive to the point that user interest in the Internet would diminish. Also, the search results obtained would be little, if any, better than a query limited to sources judged most likely by the user to contain relevant information or documents. What is needed to overcome information overload in distributed information system, e.g. the Internet is an automated system and method of information retrieval which optimally selects sources or databases most likely to provide the best response to a user query.
Prior art related to source selection in an information retrieval system includes the following:
An article entitled "An Intelligent Database Assistant" by G. Jakobson, et. al., published in the International Electronic Electrical Engineers (IEEE Expert (USA), Vol. 1 No. 2, pp. 65-79, Summer 1986, discloses an intelligent database assistant called FRED. The assistant uses artificial intelligence techniques and gives users substantial help in data selection, query formulation and data interpretation. FRED provides querying in a cooperative natural language dialogue, automatic database selection, automatic query generation and portable access to different database systems.
An article entitled "Database Selector for Network Use: A Feasibility Study", published in the Proceedings of the ASIS Annual Meeting 1977, Vol. 14, "Information Management in the 1980's", Chicago, Ill., USA, Sep. 26-October 1977, by Henry Williams et. al, discloses an automatic database selector which operates on user query terms and provides a relative ranking of databases according to applicability to the query. A test version Database Selector consists of a file containing terminology from 20 major databases, programs for data management and file generation, programs for query processing and a mathematical model for normalizing the variability (differing numbers of years worth of files, controlled versus uncontrolled terminology, hierarchial and multilevel vocabularies, etc.) that is found in multiple natural language databases. A database selector helps users and searchers determine file appropriateness for queries or help processors and producers with database comparisons, vocabulary comparisons and vocabulary compatibility problems.
An article entitled "The Selection of On-Line Databases for U.K. Company Information" published in the Journal of Librarian Information Science, Vol. 27 No. 3, pp. 159-70, September 1995, by G. Tseng et al., discloses an expert system designed to assist novice and end-users on-line searchers to identify which database to use for particular types of company information. A Company Information Database Advisor (CIDA) assists selection of on-line databases for range types of U.K. company information: basic company details; company financial information; company ownership and shareholdings. CIDA is designed to locate information about U.K. companies, the databases recommended are both national and international. The paper presents a database selection criteria identified during the project for 5 of the original CDIA company information categories. The paper outlines specific functions for which CIDA was designed and indicates significant factors influencing choice of databases for certain types of company information.
An article entitled "Adaptive User Models for Intelligent Information Filtering" published in Intelligent Systems, Proceedings of Third Golden West International Conference, Las Vegas, Nev., Jun. 6-8, 1994, by K. J. Mock et al., discloses an intelligent information filtering system which reduces a searchers burden by automatically eliminating incoming data predicted to be irrelevant. These predictions are learned by adapting an internal user model which is based upon user interaction. The report examines three techniques for filtering information: global hill climbing, genetic algorithms, and preliminary work with neural networks using radial based functions.
An article entitled "Internet Categorization in Search: A Self-Organizing Approach" published in the Journal of Visual Communications Image Represent. (USA), Vol. 7 No. 1, Academic Press, pp. 88-102, March 1996, by C. Hsinchun et al., a concept based categorization and search capability for World Wide Web (WWW) servers based on selected machine learning algorithms. The search method addresses an internet search problem by first categorizing the contents of internet documents. A multilayer neural network clustering algorithm employing a self-organizing map feature categorizes internet home pages according to their content. A category hierarchies created served and petitioned vast internet services into subject specific categories and databases and improved internet key word searching and/or browsing. None of the prior art discloses or suggests optimizing the source selection by generating a model for classifying sources and then using the model for predicting the sources most likely to contain documents that satisfy a user query.