Geography plays a fundamental role in everyday life and affects, for example, the products that consumers purchase, shows displayed on TV, and languages spoken. Information concerning the geographic location of a networked entity, such as a network node, may be useful for any number of reasons.
Geographic location may be utilized to infer demographic characteristics of a network user. Accordingly, geographic information may be utilized to direct advertisements or offer other information via a network that has a higher likelihood of being relevant to a network user at a specific geographic location.
Geographic information may also be utilized by network-based content distribution systems as part of a Digital Rights Management (DRM) program or an authorization process to determine whether particular content may validly be distributed to a certain network location. For example, in terms of a broadcast or distribution agreement, certain content may be blocked from distribution to certain geographic areas or locations.
Content delivered to a specific network entity, at a known geographic location, may also be customized according to the known geographic location. For example, localized news, weather, and events listings may be targeted at a network entity where the geographic location of the networked entity is known. Furthermore content may be presented in a local language and format.
Knowing the location of network entity can also be useful in combating fraud. For example, where a credit card transaction is initiated at a network entity, the location of which is known and far removed from a geographic location associated with an owner of the credit card, a credit card fraud check may be initiated to establish the validity of the credit card transaction.
There are various ways to determine the geographic location of a network entity with varying levels of accuracy. The information sources that may be used to assist the determination of the geographic location of a network entity also have varying levels of accuracy and trustworthiness. These information sources are highly dynamic and subject to widely varying levels of accuracy and trustworthiness over time. As such, systems and methods for determining the geographic location of a network entity must also be highly adaptable.
Various methods of locating electronic emitters to a point on the earth, or geolocating emitters, have been used for many years. These methods include a range of techniques from high-frequency direction finding triangulation techniques for finding a ship in distress to quickly locating the origin of an emergency “911” call on a point-to-point wireline telephone system. These techniques can be entirely passive and cooperative, such as when geolocating oneself using the Global Positioning System or active and uncooperative, such as a military targeting radar tracking its target.
These geolocation techniques may be targeted against a stationary or moving target but most of these direction finding and geolocation techniques start with the assumption they are working with signals in a linear medium. For example, in radio triangulation, several stations each determine the direction from which a common signal was intercepted. Because the assumption can be made that the intercepted signal traveled in a straight line, or at least on a known line of propagation, from the transmitter to each station, lines of bearing can be drawn from each station in the direction from which the signal was intercepted. The point where they cross is the point at which the signal source is assumed to be located.
In addition to the direction of the signal, other linear characteristics can be used to geolocate signals, including propagation time and Doppler shift, but the underlining tenets that support these geolocation methodologies are not applicable to a network environment. Network elements are not connected via the shortest physical path between them, data transiting the network is normally queued and later forwarded depending on network loading causing the data to effectively propagate at a non-constant speed, and switching elements within the network can cause the data to propagate through non-constant routing. Thus, traditional time-distance geolocation methodologies are not effective in a network environment. Network switching and queuing delays can produce echo distance results several orders of magnitude greater than the actual distance between the computers.
In a fully meshed network, every station, from which a geolocation in initiated, is directly connected to every endpoint from which an “echo timing” is measured. The accuracy results of geolocation using round-trip echo timing are dependent on: the degree to which the network is interconnected or “meshed,” the specific web of connectivity between the stations and endpoints, the number and deployment of stations, the proximity of the stations to the endpoints, and the number and deployment of endpoints chosen.
There are other methods for physically locating a logical network address on the Internet that do not rely on the physics of electronic propagation. One method currently in use for determining the location of a network address relies on network databases. This method of network geolocation looks up the IP address of the host computer to be located, retrieves the physical address of a point of contact for that logical network address from the appropriate registry and then cross-references that physical address to a latitude and longitude.
There are a number of shortcomings to this method. First, the level of resolution to which the address is resolved is dependent on the level of resolution of the information in the registry. Second, there is an assumption that the supplied data in the registry correctly and properly identifies the physical location of the logical network address. It is entirely possible the host associated with the logical address is at a completely different physical location than the physical address given for the technical point of contact in the registry. Third, if the supplied physical address given cannot be cross-referenced to a physical location no geolocation is possible. Geolocation information is often available from network databases but access to and the veracity of this information is uncertain.
In the past, three other approaches have been used in an attempt to solve the problem of accurate IP address geolocation. The first approach uses distance estimates to the target IP address from multiple beacons, each of which has a known location, to “triangulate” the geographic location of the target IP address. In this approach, a beacon is considered a network entity having a known location. In this approach, the distance estimate is based on traceroute information comprising a round-trip transit time of an Internet packet traveling from each beacon to the target IP address. This approach has several shortcomings. First, it requires the geographic location of each beacon, which is often difficult to obtain. Second, it requires an accurate model relating the round-trip transit time to the distance estimate. An accurate model is difficult to develop because it requires knowing precisely the speed at which signals travel over the Internet, which can vary based on network structure, network congestion, queuing delays, router speed, the curvature of the earth, and routing protocols.
A second conventional approach uses machine learning to find a model which relates traceroute information to jurisdictional location (e.g. country, state, county) based on training examples, without requiring the geographic location of each beacon. In this approach, the training examples comprise one or more pairs of the actual geographic location of a target IP address and the round-trip transit times from the one or more beacons. A major shortcoming of this approach is that the jurisdictional location is coarse-grained thus limiting the approach's accuracy.
A third conventional approach finds the latitude and longitude of a “nearest” neighbor in a set of training examples, where the “nearest” neighbor is one whose round-trip transit times are most similar to the round-trip transit times for the target IP address. This approach has several shortcomings. First, it requires a large set of training examples for accuracy. Second, finding the “nearest” neighbor efficiently does not scale up as the number of beacons increases. Storage and retrieval methods such as KD trees can improve efficiency but as the number of beacons increases, these methods degenerate to exhaustive search for the nearest neighbor. Second, it requires a “nearness” measure, which is difficult to develop because each beacon might require a different “nearness” measure. Finally, it cannot extrapolate or interpolate beyond the set of training examples. Mathematically, this approach is called non-parametric because it does not require a model with one or more parameters.
Learning is difficult because the training data can be imprecise, noisy, or missing. For example, the round-trip transit time is typically an overestimate of the actual transit time of an internet packet. Conventional approaches for implementing a learning model have resulted in models with limited accuracy or models, which do not scale up as the size of the corresponding information grows.