The exponential growth of data collections, private intranets and the public Internet has produced a daunting labyrinth of increasingly numerous information sources. Searching these sources is often a chore. For example, almost any type of product is now available somewhere on a communication network, but most users cannot find what they seek, and even expert users waste copious time and effort searching for appropriate on-line stores or other product information sources.
One problem is simply the increasingly large number of available sources that are beyond the comprehension of a single user. A second problem, along with this growth in available information, is a commensurate growth in software utilities and methods to manage, access, and present this information. Each utility has a different and often unique interface and set of commands and capabilities, and is appropriate for a different set of users and a different set of information types and sources. Thus, sheer diversity of available utilities creates problems for users comparable to that created by the information explosion. Users are now faced with the twin problems of which tool to use to inquire at which information source.
In the past efforts have been made to provide users with automatic, computer assisted services that can help solve these twin problems of the network revolution. For example, AI researchers have created several prototype software agents that help users with e-mail and netnews filtering (Pattie Maes et al., 1993, Learning interface agents, Proceedings of AAAI-93), agents that assist with World Wide Web browsing (H. Lieberman, 1995, Letizia: An agent that assists web browsing, Proc. 15th Int. Joint Conf. on A.I. pp. 924-929; Robert Armstrong et al., 1992, Webwatcher: A learning apprentice for the world wide web, Working Notes of the AAAI Spring Symposium: Information Gathering from Heterogeneous, Distributed Environments, pp. 6-12, Stanford University, AAAI Press), agents that schedule meetings (Lisa Dent et al., 1992, A personal learning apprentice, Proc. 10th Nat. Conf. on A.I., pp. 96-103; Pattie Maes, 1994, Agents that reduce work and information overload, Comm. of the ACM 37(7):31-40, 146; Tom Mitchell et al., 1994, Experience with a learning personal assistant, Comm. of the ACM 37(7):81-91), and agents that perform internet-related tasks (O. Etzioni et al., 1994, A softbot-based interface to the internet, CACM 37(7):72-75).
Increasingly, the information such agents need to access is available on the World Wide Web. Unfortunately, even a domain as standardized as the WWW has turned out to pose significant problems for automatic software agents. For one, although Web pages are universally written in Hypertext Markup Language ("HTML"), this language merely defines the format of information display, making no attempt to hint at its meaning or semantic content. Currently, no accepted "semantic markup language" for the Web exists, nor is one likely to be adopted universally. The Internet can be expected to pose even greater problems.
Thus, the advent of intranets, the Internet, and the World Wide Web have posed several fundamental problems for the automatic services or agents designed to assist users to find relevant information. First, no one such service has heretofore provided sufficient additional value to replace the use of a Web browser having access to existing on-line directories or indices such as Yahoo or Lycos. Second, such services have not yet been able to understand and competently parse relevant information from the responses returned from a wide variety of Internet and Web on-line information sources. Third, existing services and agents have not been easy to adapt to the ever-increasing numbers of sources with their ever-changing response formats. This is due to the individualized, hand-coded interface to each Internet service and Web site utilized by existing agents (Yigal Arens et al., 1993, Retrieving and integrating data from multiple information sources, International Journal on Intelligent and Cooperative Information Systems 2(2):127-158; O. Etzioni et al., 1994, A softbot-based interface to the internet, CACM 37(7):72-75; B. Krulwich, 1995, Bargain finder agent prototype, Technical report, Anderson Consulting; Alon Y. Levy et al., 1995, Data model and query evaluation in global information systems, Journal of Intelligent Information Systems, Special Issue on Networked Information Discovery and Retrieval 5(2); Mike Perkowitz et al., 1995, Category translation: Learning to understand information on the internet, Proc. 15th Int. Joint Conf. on A.I.). Preferably, a service or agent should be able to access a new or changed Internet on-line source in order to automatically learn how to retrieve relevant information from the source.