Users generate contextual signals that often need to be canonicalized before being used by a software system. Examples include IP (Internet Protocol) addresses, Wi-Fi signals and cell tower information, which some software systems need to have converted into GPS locations, or into city, state, country tuples (or the like) in order to be used by those systems. Another example includes weather applications, which are based upon being given a user's GPS location. Yet another example is reverse phone directory service, where given a phone number, the service returns information (e.g., name and address) regarding the owner of that number.
In location-based and other such scenarios, there may be multiple data sources that can provide the requested information. For example, there are multiple data sources that can provide a location given an IP address; similar situations exist for Wi-Fi and cell tower mapping information. Because of the way the data were assembled and when the data were gathered, there is sometimes conflicting mapping between these sources with respect to the input signals and actual locations. For example, the same IP address may map to Washington, D.C. on one data source and to the Netherlands on another.
While a software service accepts various type of user input, canonicalization of such ambiguous signals impacts the applications that are running under the service. This is not only because it is difficult for each application to implement logic to reduce ambiguity of the signals, but also because the contextual information needs to be consistent between applications. Canonicalization usually requires a large mapping table; however it is often difficult to evaluate how accurate each such mapping table is. For example, the conversion from an IP address to a location requires a large lookup table to map ranges of IP addresses to city names, country names and so forth. While the table format is relatively simple, the size of the table is large, whereby it is essentially impractical to confirm that the mapping of each IP range is correct.