Users who are searching for data contained in physically or logically separate data sources (such as databases, flat-files, xml, etc.) typically must issue physically separate queries to each database or warehouse where they are interested in finding data. Additionally, the results which are returned from these separate queries are not correlated, consolidated, nor de-conflicted. Therefore, the user must attempt to determine which pieces of information relate to each other and refer to the same real-world object. This would typically be done either in the user's head or with the use of another application, such as Microsoft Excel, Microsoft Word, etc., which requires significant manual work to be done. This can result in incorrect and tiring work, as well as not being readily repeatable. If the number of returns is significant, the user may not even choose to perform this manual process since the amount of information to process is too much for a human to correlate. Moreover, the require information correlation would take simply too much time to do by hand or with tools that were never intended to do such work.
Query Brokers, Federated Queries, and Distributed Queries are not new concepts. The problem is that none of these give a comprehensive view of an entity's history, since none attempt to correlate entities across different data sources. The trick is to correlate entities across the disparate systems in a domain-meaningful way, as well as be able to query additional information from those systems and correlate associations across those systems without permanently storing data either on the user's workstation or on a centralized server. Typically, Query Brokers and other distributed designs return a list of matches to a user's query, but do not even attempt to determine if pieces of information from different systems actually relate to the same real-world object because of the difficulty or potential for incorrect correlations to occur. This work is left up to the user to do “in their head”.
There are two ways to solve the problem. First, is a brute-force approach wherein all relevant data pieces are copied from source systems to the destination system and then performing consolidation or de-duplication on the destination system after all pieces of information have been copied. This will in theory work; however, computers, networks, disk drives, etc. may all be too slow for this to be practically achievable for large sets of data. Also, data could be stored in a warehouse, but not all source systems lend themselves to be replicated into a central warehouse, and the storage requirements for such a system may be extremely large.