Distributed applications generally use a plurality of databases to supply requisite application data. Under one common data distribution model, a central data repository, generally a database that is physically remote from the computers by which users access the distributed application, contains a complete set of application data. However, in order to provide users with more efficient access to application data, a database local to a user contains a duplicate set of some or all of the requisite application data. Accordingly, this approach requires synchronizing the remote central database with a plurality of local databases on a regular basis. For example, many applications require a periodic, e.g., nightly or daily, download, or replication, of data from a central data repository to local databases. This data replication consumes processing and network overhead and, if interrupted or corrupted, introduces possible errors into the distributed application. In addition, more complex data distribution models are often used, in which the central data repository is itself spread across two or more physical databases, meaning that the central data repository must itself be synchronized or updated before the central data repository can be synchronized with the local databases.
Thus, as is apparent, present data distribution models suffer from significant shortcomings. Particularly in cases where local users accessing data in a distributed application most often require a particular subset of application data, replication of all data from a central data repository results in populating and updating the local database with more data than is necessary and therefore is inefficient for at least two reasons. First, replication of extra data consumes overhead, as noted above. Second, storing extra data in the local database causes the local database to consume more resources, such as storage space and processing resources, than is necessary. On the other hand, if a complete data set is not stored in the local database, then distributed users must seek some application data from a remote data repository, which may not be available or, at a minimum, can likely be accessed only inefficiently and by consuming extra processing and network resources. Particularly where data is distributed in more than one remote database, it can be difficult and inefficient to locate desired data.
Certain distributed applications have addressed the foregoing problems in application-specific ways. For example, it may be known that a certain subset of application data is accessed by local users of the application approximately ninety (90) percent of the times when the local user accesses data. Therefore, only this subset of the application data may be maintained in a local database without undue sacrifices of efficiency and overhead. Alternatively or in addition, it may be known that a certain subset of application data is stored in a particular remote database and may be accessed in a particular manner. Accordingly, the distributed application may employ a specific routine known to efficiently access such data in a remote database. However, these approaches require particular knowledge of a distributed application as well as the data it uses and often also require knowledge of a specific computing environment.
To overcome the foregoing shortcomings in retrieving data in distributed applications, it would be desirable to have a generic solution that provides retrieval of data from a plurality of separately maintained databases. It would further be desirable to avoid regular database synchronizations, and the vast overhead and inefficiencies that result therefrom. Further, it would be desirable to reduce the overall data storage capacity required for a distributed application, and to thereby reduce the costs associated with storing application data.