It is generally difficult to determine whether a particular entity is associated with other entities or URLs on the web. For instance, given various entities such as a particular physical object, location or person, it is not easy to determine the web pages a particular individual has visited or has a presence on, nor is it easy to determine whether or not a location or physical object is mentioned on a web page. Furthermore, it is difficult to establish if and/or how those various entities are related to each other.
With regard to people entities, the use of tagging beacons on web pages along with cookies may help to determine whether or not a specific user has interacted with a web page. By dropping a cookie on the user's machine, placing beacons on web pages, and looking at cookies as they hit beacons, companies may track user behavior wherever they can place beacons. Typically companies are not able to beacon the web pages of another company without an agreement. As such, in cases where no agreement exists to facilitate beaconing a web page, it is virtually impossible to track user behavior. Furthermore, this method of determining user to webpage association is limited as the user must maintain the same cookie for the user tracking method to be successful. That is, the difficulty of tracking user behavior is exacerbated in situations where a single user maintains multiple identities across different services or URLs on the web.
In the case of non-people entities, determining whether a particular entity is associated with a URL on the web generally involves performing information extraction on a web page to determine what the relevant entities within the text of the web page are. Certain URLs may refer to a specific concept or entity and uses and/or mentions of those URLs on other web pages may indicate the presence of that specific entity on the web page. For example, a Wikipedia URL might refer to a specific location in the world and a travel web page may use the Wikipedia URL indicating the presence of that specific location entity on the travel web page. This method of determining entity to webpage association is limited in that a particular entity may be associated with a webpage without being explicitly mentioned in the text of the webpage and therefore missed by the information extraction process.