1. Technical Field
The present invention relates to the field of information retrieval and presentation.
2. Description of the Related Art
Smarter, automated research tools are required for next-generation search engines. Increasingly, businesses, students, researchers, and the general public rely on the information available from network accessible resources, or “on-line” sources, typically accessed via conventional search engines. The tremendous volume of information available through on-line sources represents both an advantage for conducting research and a disadvantage in that a significant amount of irrelevant data must be filtered to determine pertinent data. Compounding this is the problem of information reliability. On-line sources, often updated and managed by private individuals, can contain outdated, misleading, and/or otherwise incorrect information. In consequence, discerning desired information from the significant amount of data available can be a daunting and time prohibitive task.
Conventional search engines accept queries and return lists of potentially relevant on-line sources such as Internet sites, databases, and/or Web pages. While most conventional search engines perform boolean logic searches using key words, others can process natural language queries. Typically, the list of results is ordered according to the internal prioritization rules of the search engine used. Other conventional search engines, however, order the list of results according to a predefined outline. While this approach is sufficient for narrowly tailored, simple queries, it is impractical for many research tasks such as generic queries into complex fields. In such instances, the returned list of potentially relevant sites can be unmanageably large. For example, a search on the term “protein” can result in hundreds of Web site matches and millions of Web page matches.
When confronted with such a large number of references, users often must visit several of the sites from the returned list, browse the sites, and attempt to use the information gleaned from the references to formulate a more tightly focused search. The new search produces a different list of references, more site browsing, and further focused searches. Conventional search techniques described herein can be frustrating and suffer from several disadvantages.
One such disadvantage is the inordinate amount of time which can be expended before relevant information is discovered. While some researchers succeed in determining a manageable list of references from a search using the aforementioned strategy, this success usually comes after spending an inordinate amount of search time. Other researchers become discouraged while searching sites and eventually surrender their search with little gain. Another disadvantage is that users must cross-reference multiple on-line sources to be assured of the validity and accuracy of a given source. Also, when visiting the references and investigating any secondary references, it can be difficult to keep track of which references were visited, which were not visited, and which references contained meaningful information. In consequence, users tend to visit Web pages multiple times leading to inefficiencies in network bandwidth usage and further wasted time. Finally, in order to minimize the significant number of on-line sources identified through a conventional search engine, a query often must be so narrowly tailored that critical information can be excluded.
A number of solutions have been proposed that attempt to limit the list of on-line sources generated by search engines. For example, one technique is to allow users to customize the behavior of existing search engines through user-defined plug-in programs. Another technique relies on statistical induction conclusions to extract probable classifications from highly structured normalized databases. Yet another technique attempts to produce an effective list of information sources using predefined filters and trained neural networks. The solutions proposed thus far, however, focus upon improving the returned list of potential sites. Users still must recursively browse a myriad of information sources before obtaining desired information.