A typical customer setting can involve multiple data sources. These data sources are often interconnected. In addition, new sources may be constantly added and existing sources can evolve with business needs.
Customers may, for example, deploy multiple, distributed Resource Description Framework (RDF) data stores containing linked data. A minimal requirement for meaningful data analytics in this setting is the ability to perform query processing over the distributed data stores. Existing query evaluation approaches over distributed stores may require a central repository (e.g., warehouse) to collect all the queried data, or rely on federated solutions whose configuration and maintenance add an extra layer complexity unrelated to individual products and their stores. The former approach requires shipping large quantities of data to the repository. Furthermore, such a centralized approach may violate a number of security or privacy constraints that might be in place in the distributed stores, and may also create the issue of keeping the repository up-to-date, as new data are inserted in the respective stores. The latter approach, which relies on a federation, assumes that a global schema of some sort is created over the distributed stores, and this is used during query evaluation. Known federated systems lack automation to maintain the federation. A federated system can require a large amount of manual work to establish mappings, create global schemas, and maintain these mappings and schemas as sources change or new sources are added in the federation.