Internet protocol (IP) geographic location, or IP geolocation, is the practice of deducing or estimating the physical location of a device associated with a particular IP address. In other words, IP geolocation is the practice of pinning an IP address to a location on Earth with a desired degree of specificity. Techniques for estimating or deducing the geographic location of a particular IP address include inferring the geographic location from (1) the domain name server (DNS) names of the corresponding internet host or local network nodes; (2) latency measurements between the IP address and a set of devices distributed across known geographic locations; and (3) a combination of partial IP-to-location mapping information and border gateway protocol (BGP) prefix information. For more information on these techniques, see, e.g., U.S. Pat. No. 7,711,846, which is incorporated herein by reference in its entirety.
Unfortunately, IP geolocation estimates tend to be inaccurate—and sometimes wildly so—because they are based on observations of logical relationships among IP addresses, routing protocols, and applications instead of the physical relationships among cables, routers, servers, access devices, etc. Although the logical relationships are often related to the physical relationships, they are not necessarily tied together. For example, IP addresses that are next to each other in internet space are not necessarily next to each other geographically and vice versa: Brazil and Peru border each other geographically, but not in Internet space. In addition, a change in a device's physical location may not necessarily correspond to a change in the device's location in internet space or vice versa. Consider a router that announces a particular prefix via BGP. By announcing the prefix, the router establishes one or more logical Internet relationships that remain fixed from a logical standpoint even if the router moves in physical space.
Moreover, prefixes don't need to be in one place. End-user networks often have a single geographic scope, but infrastructure IP addresses, such as those used in wide-area networks (including routers, switches and firewalls) can be dispersed throughout the provider's area of operation, which can be global in scope. Hence, consecutive infrastructure IP addresses can be physically located in distant cities, even when they are routed to the rest of the Internet as a single prefix.
In addition, the network information used to infer or estimate geolocation can be inaccurate, incomplete, or both. Prefix registration is often self-reported by end users without being checked for validity by regional internet registrars. DNS information can be misleading; for example, domains associated with a particular region (e.g., .uk) are not necessarily hosted in that the region. Although internet service providers often use city abbreviations in router interface names, the naming conventions vary by provider and aren't always up-to-date. For example, the router interface could be named for the city at the far end of the fiber optic cable to which it is attached. Similarly, BGP information may be inconclusive, especially for those regional providers who announce prefixes that cover extensive geographic areas (e.g., continents).
Latency measurements can also be imprecise, often because of delays that artificially inflate the measurement time, which in turn leads to an inflated estimate of the geographic distance between Internet nodes. These delays include but are not limited to serialization delay, which is the time for encoding the packet; queuing delay at the router; and propagation delay equal to the product of the total propagation distance and the propagation speed (about 200,000 km/sec for light in optical fiber). If the communication medium (usually optical fiber) follows a meandering path instead of a straight path between two points, the propagation delay will be higher. In practice, many optical fibers follow meandering paths along existing rights-of-way. In other cases, optical fibers follow meandering paths because of geographic constraints (e.g., hills and rivers), economic constraints (e.g., lack of a business relationships between a property owner and an internet service provider), or both. Generally, the longer the latency, the more likely the propagation path is circuitous and likely to result in an artificially inflated estimate of the distance between the endpoints.
Incomplete or inaccurate network information and imprecise latency measurements cause the degree of uncertainty associated with the estimate of an IP address's physical location to rise with degree of specificity of the geolocation estimate. For instance, a particular IP address's planet (Earth) can be deduced with a very high degree of confidence. The confidence level tends to fall when identifying the IP address's continent. The uncertainty tends to increases further for the IP address's country, in part because of variations in each country's size and borders. Confidence in IP geolocation at the metropolitan area/city level tends to be even lower and depends in part on the city's location and proximity to other cities.