The prediction of the characteristics of a user is of importance to a wide variety of applications. As used herein, the term “characteristic” may include demographics characteristics such as gender, race, age, disabilities, mobility, income, home ownership, and employment status; personality characteristics; psychographics; interests; biases; likes; dislikes; values; attitudes; interests; lifestyles, activities; opinions; tastes; usage rates; brand preference; and firmographics such as industry, seniority, functional area, behavioral variables, geographic location, and anything that can be used to characterize a user. A “geographic location” or “geographic position” may be defined in terms of country/city/state/address, country code/zip code, political region, geographic region designations, latitude/longitude coordinates, spherical coordinates, Cartesian coordinates, polar coordinates, Global Positioning System (GPS) data, cell phone data, directional vectors, proximity waypoints, or any other type of geographic designation system for defining a geographical location or position.
One type of characteristic is the geographic location of the user. For example, knowing the geographic location of the user may be important for fraud detection, localized advertising, authentication, and the like. Various methods of locating electronic emitters to a point on the earth, or geolocating emitters, have been used for many years to find the geographic location of a user. These methods include a range of techniques from high-frequency direction-finding triangulation techniques for finding a ship in distress to locating the origin of an emergency “911” call on a point-to-point wireline telephone system. These techniques may be passive and cooperative, such as when geolocating oneself using the GPS, or active and uncooperative, such as a military targeting radar tracking its target.
These geolocation techniques may be targeted against a stationary or moving target, but most of these direction finding and geolocation techniques start with the assumption of working with signals in a linear medium. For example, in radio triangulation, several stations each determine the direction from which a common signal was intercepted. Because the assumption can be made that an intercepted signal travels in a straight line, or at least on a known line of propagation, from the transmitter to each station, lines of bearing can be drawn from each station in the direction from which the signal was intercepted. The point where the lines of bearing cross is the point at which the signal source is assumed to be located.
In addition to the direction of the signal, other linear characteristics can be used to geolocate signals, including propagation time and Doppler shift, but the underlining tenets that support these geolocation methodologies may not be applicable to a network environment. Network elements may not be connected via the shortest physical path between them, data transiting the network may normally be queued and later forwarded depending on network loading, causing the data to effectively propagate at a non-constant speed, and switching elements within the network may cause the data to propagate through non-constant routing. Thus, traditional time-distance geolocation methodologies may not be effective in a network environment. Network switching and queuing delays may produce echo distance results several orders of magnitude greater than the actual distance between the computers.
In a fully meshed network, every station from which a geolocation is initiated may be directly connected to every endpoint from which an “echo timing” is measured. The accuracy results of geolocation using round-trip echo timing may be dependent on: the degree to which the network is interconnected or “meshed,” the specific web of connectivity between the stations and endpoints, the number and deployment of stations, the proximity of the stations to the endpoints, and the number and deployment of endpoints chosen.
There are other methods for physically locating a user's location relative to a logical network address on the Internet that do not rely on the physics of electronic propagation. One method currently in use for determining the location of a network address relies on network databases. This method of network geolocation may look up the IP (internet protocol) address of the host computer to be located, may retrieve the physical address of a point of contact for that logical network address from the appropriate registry and may then cross-reference that physical address to a geographic location.
Another approach uses distance estimates to an IP address associated with a user's location (i.e., the “target” IP address) from multiple beacons, each of which has a known location, to “triangulate” the geographic location of the user. A beacon may be considered a network entity having a known location. In this approach, the distance estimate may be based on traceroute information comprising a round-trip transit time of an Internet packet traveling from each of one or more other users to the user.
A second conventional approach uses machine learning to find a model, which relates traceroute information to jurisdictional location (e.g. country, state, county, etc.) based on training examples. In this approach, the training examples may comprise one or more pairs of the actual geographic location of a user and the round-trip transit times from the one or more other users. A major shortcoming of this approach is that the jurisdictional location may be coarse-grained, thus limiting the approach's accuracy.
A third conventional approach may find the geographic location of a “nearest” neighbor in a set of training examples, where the “nearest” neighbor may be one whose round-trip transit times are most similar to the round-trip transit times for the user.
A fourth conventional approach may predict a user's location based on the locations of a set of friends. For example, the geographic location of a user may be a location that is as close as possible to a set of “friend” locations while as far as possible from “non-friends.”