The exponential growth of private intranets and the public Internet has produced a daunting labyrinth of increasingly numerous documents, databases and utilities. Almost any type of information is now available somewhere, but most users cannot find what they seek, and even expert users waste copious time and effort searching for appropriate information sources. A first problem is simply the increasingly large number of available information sources that are beyond the comprehension of a single user. A second problem, along with this growth in available information and information sources, is a commensurate growth in software interfaces and methods to manage, access, and present this information. Sources are managed by different organizations, hence agents, whether human or automated, must adhere to the remotely defined formats. The information sources are potentially slow and expensive, so users must balance the cost of each access against its estimated benefit. The information sources are dynamic, hence an agent must recognize when an existing sources's contents, protocol or performance changes, as well as when new sources come online and existing sources leave. Many sources represent legacy systems in the sense that they do not support a comprehensive query interface such as SQL; in these cases an agent needs to expend additional effort to determine the best way to answer an information gathering request.
Artificial intelligence and database researchers have addressed this problem by constructing integrated information gathering systems that automatically query multiple, relevant information sources to satisfy a user's information request. See, e.g., [9, 5, 12, 15, 18, 16, 25, 29, 10]. These systems raise the level of the user interface, since they allow the user to specify the information of interest without specifying where it is stored or how to access the relevant sources [9]. Several researchers in the database community are concerned with the integration of heterogenous database. Prominent projects include the Information Manifold ([14]) and the Tsimmis project ([5, 25]). Generally, however, Tsimmis project assumes information integration is done manually and not automatically. Other work that mentions automatic integration provides no suggestion of the necessary methods ([18]).
Most prior work on AI planning systems ([1]) assumes that execution of an operator instance has a causal effect on the world, which leads to more complex methods and slower planning than are useful for automatic information access. Several planning systems have been designed for information gathering, for example, the XII planner ([9, 12]) and the Sage planner ([2, 15]). However, neither of these examples can represent information sources that generate information which translates into partially specified sentences in an information domain model, because they are unable to handle unbound variables with sufficient generality. Neither are they able to represent an incomplete source that returns variable numbers of tuples. These systems typically use cumbersome and inflexible representations of information domains and sources. Further, most of the planners described above have significant combinatorial explosions and require domain-specific, search control for anything but small problems.