1. Field of the Invention
The present invention concerns the creation of a database that identifies geographic locations of devices connected to a network, such as devices communicating over the Internet.
2. Description of the Related Art
The Internet is a decentralized global network of millions computers. Each computer connected to the Internet is independent and may be capable of operating as a host computer (host) that primarily provides data over the Internet or a client computer (client) that primarily receives data over the Internet. A host computer may receive a data request from any other computer on the Internet and respond to the request by transmitting any of various types of data, such as hypertext markup language (HTML) code, back to the client. A client computer may send data requests to various hosts on the Internet and then download data in response. Typically, host computers are used by information providers for various commercial, educational, or governmental purposes and are dedicated host computers (servers or Web servers).
Ordinarily, the client computers are used by individuals to connect to the Internet via an Internet Service Provider (ISP) or, more generically, a network service provider (NSP). ISPs are companies that provide access to the Internet, typically for a fee. For example, a client computer may establish a dial-in connection to an ISP over an ordinary telephone line. ISPs are also called IAPs (Internet Access Providers).
Each host and client on the Internet is identified by a unique Internet Protocol (IP) address which is a series of numbers, such as 24.130.64.154. Because the IP address, in its numeric form, is difficult to memorize and use, a domain name may be assigned to a host and, therefore, associated with the numeric IP address. For example, a server having an address of 24.130.64.154 may be associated with domain name server.npeponis.com. It is noted that multiple IP addresses may be associated with the same domain name and, similarly, many domain names may be associated with the same IP address or addresses. A domain name server (DNS) performs the task of converting the domain names to IP addresses. Most frequently, separate domain names are not permanently assigned to individual clients but, rather, blocks of IP addresses are assigned to the ISPs that serve those clients.
FIG. 1 illustrates a client 101 communicating with a server 103. In the instant example, the client 101 first connects to its local ISP 105 (e.g., using a modem via a dial-in connection). For purposes of the current connection only, ISP 105 assigns one of its IP addresses to client 101. Upon completion of this connection, client 101 may begin communicating over the Internet. For example, the client 101 may send a request for file main.html to the server 103 having the domain name server.npeponis.com. Such a request might be initiated, for example, by the user typing http.//server.npeponis.com/main.html in the address field of a web browser running on client computer 101 and then pressing the xe2x80x9cEnterxe2x80x9d key. Alternatively, such a request might be initiated by the user simply clicking on a graphic, image or text item that serves as a hyperlink to that address. In response, the browser sends out one or more data packets (or datagrams) addressed to IP address 24.130.64.154 (possibly, after having obtained that IP address from a DNS), with such data packets including a request to retrieve file main.html.
Communication between two entities on the Internet is conducted in accordance with certain protocols. The most commonly used protocols are the Internet Protocol (IP), which is a connectionless-mode communications protocol, and Transmission Control Protocol (TCP), which is a connection-oriented protocol. In accordance with TCP/IP, messages are divided into smaller packets. Each such packet includes, in addition to the destination address and data corresponding to at least a portion of the message, an IP address identifying the source of the packet and various other fields necessary for communication in accordance with TCP/IP and other established protocols. Some of these other protocols and fields are described below. As noted above, the IP address for a client computer connecting through an ISP typically is dynamically assigned by the ISP each time the client computer connects to the ISP and then reassigned after the client using it logs off.
Upon receipt of request 102, the server 103 typically first initiates handshaking communications to establish a TCP connection and then responds to the request by sending to the client one or more data packets that together contain the contents of the file main.html. In this manner, communications can occur between two nodes on the Internet, with TCP/IP specifying the protocols for separating each message into data packets, routing the packets between the two nodes, reassembling the packets at the destination, and verifying that each message was properly received.
Another commonly used protocol is the HyperText Transfer Protocol (HTTP) format. The HTTP format is the underlying protocol used by the World Wide Web on the Internet and defines how messages are formatted and transmitted, as well as what actions Web servers and browsers take in response to various commands.
On the Internet, most data packets, including requests and responses, need to go through several routers before they reach their final destination. Each forwarding of a packet to the next router is termed a xe2x80x9chopxe2x80x9d. A router (or gateway) is a device that connects one network to another. Each router includes a dynamically updated routing table that is used by the router to identify the next router to which any given packet should be forwarded. Specifically, the receiving router attempts to identify the router that is most likely to be closest (geographically and/or in terms of number of hops) to the packet""s ultimate destination.
In the example of FIG. 1, client 101 sends a request 102 to server 103. The request is delivered to the server 103 via routers 105, 107, 109, 111, 113, and 115. As indicated by the ellipsis 117, the request may go through other routers as well. In other words, the request may make many hops before reaching the intended server 103. As noted above, the precise path taken by request 102 will be determined by the individual routers along the way. In the event that a receiving router determines that it is unable to forward a packet closer to its final destination, it will send a message to that effect back to the router from which it received the packet. Then, that router will attempt to route the packet through a different router, adjust its routing table accordingly, and send a message to the router from which it received the packet. Such a situation might be temporary (e.g., in the case where a router is temporarily inoperable) or permanent (e.g., where a router has been permanently taken off line). Other communications, such as periodically broadcasting a router""s entire routing table, also occur among the routers on the Internet, permitting them to coordinate their routing activities. Propagation of changes in the network topology through the various routers in the network can permit communications to occur fairly reliably, even in the presence of constantly changing network conditions. Among the tools commonly used are the Routing Information Protocol (RIP) and the Internet Control Message Protocol (ICMP).
Irrespective of the route through which the request 102 is made or the number of hops taken by the request 102, the preferred end result is the receipt of the request 102 by the host 103 and the response by server 103 sending the requested data file via the Internet. Like the request, the data file is divided among appropriately sized (e.g., using conventional algorithms to identify an appropriate size) data packets and may travel through several routers to arrive at the client 101. Generally, the route taken by the response 104 will be the same as that taken by the request 102. However, it is possible that the routing may be asymmetric, such as where a client computer transmits packets to an ISP over a conventional telephone line/modem connection but receives packets via a satellite dish, e.g., via the Direct PC network. Asymmetric routing may also occur in certain other cases, such as where a router in the link used for transmitting the request goes down before the response to the request can be transmitted; therefore, the response needs to be re-routed. In addition to asymmetric routing, it is also possible that packets traveling in a single direction (e.g., all request packets) may take different paths (multi-path routing). This may occur, for example, in the event that a router goes down while the request is being made; in addition, one or more routers in the link may be intentionally configured to route packets that are addressed to the same destination to different routers in an attempt to balance the communication load over the Internet. However, at present, both asymmetric routing and multi-path routing are considered to be unusual routing conditions.
The response to the request may contain any of a wide variety of information. However, in many instances, it would be preferable for the response to contain information that is tailored to the specific geographic region of the client 101. For example, it may be preferable for the file 104 sent as the response to the request 102 to include weather information for the geographic region in which the client 101 is located. In another example, it may be preferable for the file 104 sent as the response to the request 102 to include banner or other advertising for businesses located within driving distance of the location of the client 101. In conventional systems, the response file 104 typically may contain these types of information only when the user of the client computer 101 has already supplied information regarding his or her location to the server 103, at least once. Unfortunately, many users may not want to expend the effort necessary to type in address or even zip code information that identifies their geographic locations. Moreover, even those that are willing to do so typically will find it very inconvenient, particularly when such information may have to be supplied for each different Web site that the user visits.
The prior art has included some discussion regarding automatically gathering information concerning the geographic location of Internet clients. However, all of these techniques have certain shortcomings, most notably, relatively long delays and limited access to geographical information pertaining to nodes on the Internet.
For example, U.S. Pat. No. 5,948,061 (the xe2x80x9c061 Patentxe2x80x9d) to Merriman et al. titled xe2x80x9cMethod of Delivery, Targeting, and Measuring Advertising Over Networksxe2x80x9d (which is incorporated herein by reference as though set forth herein in full) notes that a trace route operation can be used in obtaining geographic information for a user. In this regard, conventional trace route operations were originally designed to troubleshoot Internet routing problems (such as routing loops) and generally function by sending out a number of probe packets, all addressed to the same target node, to identify all of the routers that forward packets between the current node and the target node. All of the probe packets are IP packets, each having a Time-To-Live (TTL) field which indicates the maximum number of hops that the IP packet can make before an ICMP Time Exceeded packet is returned.
The following description summarizes the operation of conventional traceroute operations in more detail. In operation, each router decrements the TTL field by 1 and then forwards the packet on (if the TTL value is greater than 0) or sends an ICMP Time Exceeded packet (if the TTL value is 0). Thus, if a probe packet is sent with a TTL value of 1, the first router to receive the packet decrements the TTL field to 0 and sends back a Time Exceeded packet. Because the Time Exceeded packet includes the source""s address, the current node can identify the IP address of the closest router to it. If this node is not the target node, then the current node will send a probe packet with a TTL value of 2. Upon receipt, the first router decrements the TTL field to 1 and forwards the packet to the next router. The second router then decrements the TTL field to 0 and sends a Time Exceeded packet to the current node. Thus, the current node can identify the IP address of the second router. This conventional traceroute process continues until the target node responds to a probe packet, at which point the entire route will have been traced.
Although the foregoing description indicates that only a single probe packet is sent at each TTL value, conventional traceroute operations often send a fixed number of packets (e.g., 3) at each TTL value to cope with the problem of lost packets. Alternatively, a traceroute may wait a certain period of time for a response and, if no response is received within that time period, assume that the packet (or the response) is lost and send another probe packet with the same TTL value.
Conventional traceroute operations can take as long as 12 seconds on average to trace an entire routing path. This additional delay can be significant, particularly when considered in connection with all of the other delays at the server and in routing messages via the Internet. Because Internet users often are impatient with slow responding Web sites, such additional delays might result in loss of visitors to a Web site.
In addition, conventional suggested techniques for geographic positioning, such as the ""061 Patent, often rely on telephone directories and other available sources to obtain geographic locations for nodes on the Internet. Such sources may be incomplete and/or not as up-to-date as possible.
The present invention addresses the foregoing problems by dialing into multiple points of presence and transmitting a message a message to a fixed node on the network through each POP.
Thus, in one aspect the invention is directed to populating a database with geographic locations for network devices. A node is provided on a network and a connection is made into a network service provider (NSP) point of presence (POP) to obtain a connection to the network via the NSP. A message is then transmitted to the node over the network connection obtained from the NSP. The message is received at the node and a source network identifier is extracted from the message. The source network identifier is then associated with a known geographic location for the POP in a database. The foregoing steps are then repeated for multiple different POPs.
By populating a database in the foregoing manner, the present invention typically can obtain more current information regarding nodes on the network than is possible with conventional techniques. Such a database can then be used, for example, in connection with identifying geographic locations for clients accessing a website.
In a further aspect, the invention is directed to populating a database with geographic locations for network devices. A node is provided on a network and a connection is made into a network service provider (NSP) point of presence (POP) to obtain a connection to the network via the NSP. A message is then transmitted to the node over the network connection obtained from the NSP. The message is received at the node and a source network identifier is extracted from the message. The route over the network between the node and the POP is then probed to obtain network identifiers for routers along the route. The foregoing steps are then repeated for multiple different POPs.
By generating a database in the foregoing manner, the present invention often can identify important information regarding the topology of the probed network in a more efficient manner than is possible with conventional techniques. Once network identifiers for routers are identified in this manner, such network identifiers can be looked up in a database in an attempt to identify geographic locations for such routers, thereby providing a geographic map of nodes on the network. In addition, using information obtained from probing various routes along the network, information concerning routing patterns on the network often can be derived. Such information may be even further enhanced by providing multiple nodes on the network, sending similar messages to each of such nodes, and then probing the route from the current POP to each of such nodes.
The foregoing summary is intended merely to provide a quick understanding of the general nature of the present invention. A more complete understanding of the invention can only be obtained by reference to the following detailed description of the preferred embodiments in connection with the accompanying drawings.