Computers allow humans access to information in large quantities, with greater ease than before. Even for data sources that were “siloed” or kept separate, computers help to break down walls separating these data sources. These different data sources may be created, maintained, or modified by different companies or organizations, but sometimes different data sources exist even within a single company or organization.
The ease of producing vast amounts of data from various data sources outstrips our ability to make sense of and use the data. Data from each data source is usually stored in different forms, meaning that the information from each source may be encoded using different formats, have different digital identifiers for the same or similar pieces of information, or include other differences. This makes it difficult to understand how information from one data source relates to another piece of information from another data source.
As one example, it is useful to be able to properly select content for a person so that it matches their taste. However, each person's digital life has gotten much more complicated. A person may have information spread across multiple data sources, for example, browsing history stored with one service, purchase history with another, social networking profile including their friends and family, news services they visit, and communications platforms they use to reach out to others. Each of these data sources is an important, but incomplete picture of the person. For example, a social networking site may indicate who a person knows and communicates with, but will generally lack information on what the person's viewing history is. As another example, a news service may indicate news preferences of a user, but will not have information with whom the user shares news articles with.
It is often computationally expensive to merge all these data sources together. For example, the processing power to scour data from each data source and then to reconcile data from the data sources is difficult and time-consuming. To reduce these computationally expensive operations, merging is avoided or done infrequently, resulting in information that is stale and of reduced usefulness.
Therefore, there is a need to reach a balance between computationally expensive operations and having up-to-date information.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.