The World Wide Web (Web) connects numerous client and server computers all over the world via the Internet. Users of the client computers employ Web browsers to locate Web pages. Pages are located according to names called Uniform Resource Locators (URLs). Specialized servers called search engines maintain indices of the content of Web pages. The browsers are used to pose textual queries, and in response the search engines return result sets of URLs that identify Web pages that satisfy the queries. Usually, the result sets are rank ordered according to their relevance. The URL names of the result sets can then be used to retrieve the identified Web pages, and other pages connected by "hot links."
However, some users are interested in more than just the content of the Web pages. These users are interested in how Web pages are connected to each other. In other words, these users are interested in exploring the connectivity information embedded within the Web for practical, commercial, or other reasons.
The connectivity information provided by the search engines is only a byproduct. An unsophisticated user can easily follow a trail, but to extract a more global view of connectivity is tedious. The connectivity representation in the search engines is for a single purpose: to provide answers to queries. However, to determine all pages two links removed from a particular page may require thousands of queries and substantial amount of processing by the user. Without a separate representation of the Web, it is very difficult to provide linkage information. In fact, most search engines do not provide access to any type of connectivity information.
Linkage information between web pages is a valuable resource for Web visualization and page ranking. Several ongoing research projects use such information. Most connectivity information is obtained from ad-hoc Web "crawlers" that build relatively small databases of local linkage information. A database is either built on the fly, or statically. On the fly, each new page reached is parsed to locate the links. The linked neighboring pages are retrieved until the required connectivity information is gathered. For a static database, the database is essentially rebuilt from scratch whenever updates are required. For example, the service named Linkalert provided by Lycos (see "http://www.lycos.com/linkalert/Overview.htm"), uses static databases specifically designed to offer linkage information for particular Web sites. Both approaches are inefficient and clumsy to use, and do not extend to the entire Web and a large number of clients. Consequently, their implementations generally perform poorly and/or are limited in scope.
It is desired to provide a "connectivity server" that is easy to use, and efficient. The server should maintain accurate linkage information for a significant portion of the web. The server must support a large number of client users desiring many different types of connectivity information. In addition, the system must dynamically update the connectivity information so that the linkage information is current.