The World Wide Web (hereinafter “web”) is now widely accepted as a tool for information dissemination and gathering, business transaction, and recreational uses. The web is not a single entity at a single location; it is a distributed collection of web servers, each connected to at least one data network from a group of interconnected data networks that collectively form the “Internet”. Those desiring to provide information or services to web users place that information on web servers, in a format such that web users can retrieve that information over an Internet connection.
Although web servers (“hosts”) and web users (“clients”) can in principle communicate using other protocols, most web server transactions today use the HyperText Transfer Protocol (HTTP). HTTP is a request/response protocol, i.e., the client sends resource requests to the server and the server responds to those requests by transmitting the requested resources back to the client. A resource may be a file containing text, graphics, audio, video, etc., the output of a script running on the server, a dynamically generated query result, or anything else that the host server can generate and the client can understand.
An HTTP transaction appears straightforward on the surface, but requires a great deal of behind-the-scenes effort. FIG. 1 illustrates a typical web communication scenario. The web client 22 shown in FIG. 1 can be, e.g., a web browser running on a personal computer, a personal data assistant (PDA), a web-enabled wireless phone, or some other web appliance. Web client 22 has access to some means for establishing and maintaining a data connection with an ISP (Internet Service Provider) gateway 32 via PSTN (Public-Switched Telephone Network) 30. This means can be, e.g., a traditional analog modem that can be connected to a wired analog phone line or analog wireless channel, or a digital modem that can communicate over a DSL (Digital Subscriber Line), ISDN (Integrated Services Digital Network) connection, or digital wireless channel. The modem can be, e.g., built into client 22, connected directly to a port on the client, or reachable across a local network shared by the client and the modem.
When a user of client 22 desires to reach a host on network 40, client 22 must first establish a session with ISP gateway 32, if not already connected. This generally requires client 22 accessing a line on PSTN 30, dialing a service number for gateway 32, establishing physical-layer modem communications between the gateway and the client, and then establishing a link-layer protocol such as PPP (Point-to-Point Protocol) between the gateway and the client. At some point, the user will generally supply logon information such as account name and password before the gateway will continue the logon process. PPP may also be used to communicate to the client a dynamically assigned IP (Internet Protocol) address that the client will be known by during the session. Once the session is established, client 22 and ISP gateway 32 can communicate using IP (Internet Protocol) packets, or packets of other types, encapsulated via PPP.
With an established session, the ISP can allow web client 22 general access to web servers reachable through data network 40. In one example, a user wishes to search router product literature on the Cisco Systems website, i.e., information supplied by host 46. Host 46 in this example has a domain name www.cisco.com and an IP address of 198.133.219.25. The user of web client 22, however, may know only the domain name www.cisco.com, only the company name Cisco Systems, or neither.
When the user knows the domain name for the web page that they need, the user can enter the Uniform Resource Locator (URL) (e.g., http://www.cisco.com/<resource_path>) for that web page at their web client 22. In this example URL, the service name field “http” signifies that HTTP is to be used to retrieve the resource, the field www.cisco.com identifies the domain name of the hosting web server, and the optional field resource_path identifies the specific resource on the web server that is requested. But before the resource can be requested, client 22 must establish a TCP (Transmission Control Protocol) connection to a port (generally port 80) on the server identified with IP address 198.133.219.25.
Since the client typically will not know the IP address associated with www.cisco.com, it sends a DNS (Domain Name Service) query to a specialized server that the client does know the IP address of—primary DNS name server 42. Client 22 will typically either store a hand-entered IP address for its primary DNS name server, or such an address will be supplied by the ISP during logon. Client 22 uses the DNS protocol to query DNS name server 42 as to the IP address for the domain name www.cisco.com. DNS name server 42 maintains a database that cross-references domain names with IP addresses and/or other DNS name servers. DNS name server 42 may know the answer to this query, but more typically, it will search out the answer on another DNS name server that it can reach, in this case DNS name server 44. DNS name server 44 is a root name server for the domain cisco.com, and can supply the requested IP address (e.g., 198.133.219.25).
Client 22 uses the retrieved IP address to open a TCP connection to port 80 on the domain server www.cisco.com. Client 22 then sends an appropriate HTTP message, such as
GET /<resource_path>HTTP/1.1,
across the connection. This message will prompt a return message from host 46, which, if successful, will contain the resource resource_path.
When the user does not know a URL or a domain name associated with the web information that they are seeking, other steps will be required in addition to those described above. For instance, search engine host 48 represents a commercial search engine of the type that builds extensive databases linking domain names and URLs to keywords. By accessing search engine host 48, a user can enter a request such as “Cisco router” and obtain matching entries for URLs and domain names matching the request. Since the format of search and result screens are themselves web pages, accesses to search engine host 48 are themselves HTTP operations requiring URLs, domain names, IP addressing, and at least one DNS name server request.