Frequently, enterprises are maintaining portions of their information assets at multiple geographically dispersed locations. For example, one facility may need to capture its data locally because that data is used the most frequency by that facility. Sometimes the data relevant to all the facilities of an enterprise are centrally located at a remote, external, and geographical facility. Each facility may have its own data needs, some of which may never be moved to the centrally located facility. Moreover, in some cases, it is more efficient for the data to be distributed across facilities but act together as one data warehouse as a single Relational Database Management System (RDBMS). That is, there are a variety of reasons why enterprise will have data assets geographically distributed and still be capable of issuing queries that access the entire set of data assets from multiple facilities.
However, transferring large amounts of data over network connections is a performance sink. The data may need transferred for a variety of reasons, such as to perform a join operation on two database tables while processing a query where each table is located on different servers over a network that are geographically dispersed. Moreover, each server may be missing one of the tables needed for the join operation. In this example and for the query to process properly, one of the servers needs to transfer its table relevant to a portion of the join operation to the other server having the remaining table relevant to the other portion of the join operation so that the join operation can efficiently process.
Presently, there is no sufficiently intelligent mechanism to select which way the table being transferred over the network should move. Typically, there is likely just a default rule that an enterprise deploys that moves one table in a predefined direction over the network. However, depending upon the type and quantity of data for the table being transferred, this default rule can be a mistake by unnecessarily taxing network bandwidth, resources, and significantly delaying results returned form a query to an end user.
Some existing solutions attempt to alleviate the situation with elaborate caching techniques (which have to be continually updated and flushed to stay up-to-date with the ever changing data).
Therefore, there is a need to more efficiently transfer data in a geographically dispersed database environment when processing queries that rely on geographically dispersed data tables.