Client-server computing has been the dominant computing paradigm for the last several years. With the advent of the World Wide Web, client-server computing has been manifested in the form of remote site servers supporting databases of information, and client-side applications (like web browsers) querying the remote site servers and presenting the results to a user. This is how most of the Web looks today.
A new class of applications has been developing recently, designed to help database owners make their information more accessible from the Web. For example, if a design engineer wants to find information such as price and availability for components they are considering for a circuit board, manufacturers of such components have a strong incentive to employ applications that connect their databases to the Web. However, the query interface offered by each such manufacturer is typically different, with many requiring several levels of navigation before the desired query page is reached. Also, the results of user queries are often presented in different formats on a particular Web page. Therefore, the exemplary design engineer would typically visit some of the manufacturers' web sites, query them, and save the results manually for later (and possibly tedious) comparison.
As an improvement over the above approach, vertical portals and electronic marketplaces have sprung up, offering consolidated information to simplify the search. Examples include Chipcenter (www.chipcenter.com), E2open (www.e2open.com), Questlink (www.questlink.com), and Free Trade Zone (www.freetradezone.com). These tools crawl, extract, and index information from the various web data sources (e.g. component manufacturers and distributors) periodically, and then publish the information for various web clients. This approach has the disadvantages that the information at the portal is often stale, the data sources may shut out the portal from collecting the information, and centralized remote site servers have inherent reliability and scalability cost issues associated with them.
Some agent-based systems have been designed to address some of the above concerns. Examples include the Jango shopping agent of Excite.com (www.jango.com) and Pricing Agent of Half com (www.half.com). In these systems, in response to a user request, the central server spawns agents that crawl to the remote data sources, collect information, and then the server finally presents the information to the user. Note that the involvement of the client computer is minimal in these systems, as most of the work happens on the server side. The agents run with the server identity, and hence can be shut out by data resources easily.
The following U.S. Patents describe related systems: U.S. Pat. No. 6,038,668 to Richard R. Chipman et al. (referred to hereafter as Chipman) and U.S. Pat. No. 6,108,686 to Henry R. Williams, Jr. (referred to hereafter as Williams). Chipman describes the use of a predefined common language (e.g. HTML) and format for organizing information placed on a network of computers. A portal maintains a list of HTML pages at each supplier's site that comply to the predefined constraints. Chipman contemplates that each industry sector will have at least one governing portal from which all other portals in that industry derive their common vocabulary, taxonomy or ontology. For example, one might perhaps employ DTDs (Document Type Definitions) with XML based systems, so that everyone uses standard forms for purchases. However, this system has many shortcomings. Vendors may not necessarily be willing to cooperate with this scheme as it requires a great deal of labor to organize information for a so-called governing portal that would tend to promote selling by price alone. Large vendors in particular may prefer to present their data as they see fit. The Chipman system is not flexible enough to do things that non-compliant vendors may want to do, i.e. it is not able to deal with less than strict compliance with predefined constraints. Chipman discloses that consumers can supply data to the portal for re-supply to other consumers, but there's no guarantee that such data will be accurate. Finally, Chipman provides no means for acquiring rules that describe how a client can gather information from a remote site; individual consumers must either acquire the information directly or determine such rules manually.
Williams enables a user to define a unique set of search rules for locating information and retrieving documents. The search rules (which are subject or keyword based) are provided to a search agent that then automatically accesses content in remote databases according to the search rules. The Williams system stores the acquired information in a local database using the same organizational structure in which the information was stored in the remote database. Users can then run queries against the local database contents to manually extract interesting features.