The Internet is a worldwide network of computers with a multitude of sites providing a vast amount of information. A major part of the Internet is called the World Wide Web (WWW). It represents the sites on the Internet which operate in accordance with hypertext transfer protocol (HTTP), commonly called Web sites. To access information on the WWW, a Web browser operating on a computer coupled to the Internet allows a user to access to, and the ability to receive Web pages from, the WWW. Each Web page represents a document formatted in a Hypertext Markup Language (HTML) which directs the Web browser on how to display the text, graphics, and hyperlinks of the Web page. Hyperlinks represent graphical regions of a Web page which when selected by a user direct the Web browser to the addresses of other Web pages.
The Web sites may be considered as representing numerous on-line resources. At present, productive use of such on-line resources to the computer user is hampered by the huge amount of information present on the WWW. An excessive amount of time is required to locate useful data, and the dynamic and transient nature of such on-line data often means that information is lost, overlooked or quickly outdated. The result is that on-line users often spend more time searching for information than actually using it. Traditional solutions to this problem include online indexes. Online indexes are usually included in popular search engines on the Internet, such as Alta Vista or Lycos. A user can access the site of a search engine and input a query, and then receive a list of addresses of Web pages which could be relevant to the query. The databases of indexes are continually updated, but generally only offers a first-level filter on information, thus requiring users to search manually for relevant data. Furthermore, due to the great number of Web sites having Web pages, such indexes often include 35% or less of the number of Web pages available on the WWW. An index/retrieval system having a search engine is described, for example, in U.S. Pat. No. 5,748,954.
To build the individual entries on the indexes of Web search engines, software robots or agents are often used to search individual Web pages along the Internet to locate Web pages to include in their index. The software robots are typically called Web crawlers, wanders or spiders, since they continuously search Web pages linked to other Web pages. The process of crawling the WWW is slow and time-consuming due the expansive number of sites on the Internet, and includes rules which necessarily limit the number of terms to be used. Web crawlers for on-line indexes have very limited intelligence, and are focused on identifying search terms to be used in the index to be cross-referenced to Web pages. Moreover, although companies providing Web search engines may use Web crawlers to develop their indexes, a typical computer user does not have access to Web crawlers, and must rely on querying search engines on the Internet to locate Web pages potentially relevant to their needs.
Other approaches for locating information on the Internet include directories and catalogs. Online directories, such as Web-based Yahoo, compile information on popular topics or areas with human aid, but are highly subjective and often too general for many information seekers. Online catalogs are lists through which a user can scroll and select a Web page of interest to review. Such online catalogs are also compiled with human assistance but have no associated search engines.
Web-based intelligent agents with neural networks have been developed to search the Internet. For example, Automony Inc. of the United Kingdom has developed Agentware software which uses agents, neural networks and pattern matching to identify Web pages to provide categorization and cross-referencing of digital information. However, such Web-based intelligent agent technology often requires constant supervision for operation. Queries to be used by agents are stated in simplistic abbreviated form. Further, such agents do not learn or rely on a single machine learning mechanism, and often are limited to queries of text-based tasks. They are unable to initiate actions autonomously or operate autonomously. These agents further do not evolve into new agents which can potentially improve the ability to classify Web pages without user intervention, and their ability to be trained by user feedback or other knowledge inputs are highly circumscribed. Web agents with the ability to learn are described, for example, in L. Chen & K. Sycara, 1998, "WebMate: A personal agent for browsing and searching", Proceedings of Autonomous Agents 98, pp. 13 2-13 8, T. Joachims, D. Freitag & T. Mitchell, 1998, "Web Watcher: A tour guide for the World Wide" Web, Proceedings of IJCAI 97, and M. Pazzani, J. Muramatzu, D. Billsus, 1996, "Syskill & Webert: identifying interesting Web sites", Proceedings of AA-Al conference.
Some existing Web agent systems can deploy multiple agents for the same core query, as provided by the MetaBot search engine, but there is usually no inter-agent communication or inter-agent learning. Multiple Web agents are used only as a means of speeding the recovery of data, not as a means of improving the retrieval performance of the system.
To facilitate searching the WWW for information, meta-searching programs have been developed to query multiple Web search engines and combine the results of the searches. This can provide a more complete search of the WWW than can be provided by any single Web search engine. The company Agent Technologies Inc. has developed software called Copernic98Plus having the capability to search multiple content-specific sites and simultaneously searching more than a hundred search engines using smart agents. Meta-searching programs however are limited to operating on the results of searches from Web search engines and do not utilize Web crawling to locate documents.
It is thus desirable to provide a system which allows a user at their computer to retrieve desired information on the WWW from their computer by combining the search capability of Web crawling with the meta-searching of multiple Web search engines using agents which learn and evolve as the search progresses.