Databases are used to store large quantities of computer data. In order to increase the usefulness of a database, it is desirable to enable the storage of data such that it can be easily and quickly stored, can be easily and quickly retrieved, is secure, and addresses ambiguities regarding current data where multiple copies or versions of data are stored. It also is desirable to limit the amount of time required to perform updates of stored data. In addition, it is desirable to limit the amount of data that must be transferred across communications networks to data sources and/or database clients that are distributed across multiple geographic sites to reduce latency for near real time applications.
One approach to providing a database architecture, illustrated in FIG. 1A is to provide a single centralized database 104 that serves multiple data sources 108a-b and multiple data clients 112a-b. Although a centralized architecture avoids issues regarding identifying current versions of data, the scalability and availability of such an arrangement is poor. In particular, such arrangements are slow and cumbersome when used with large volumes of data, and/or in real-time or other high availability applications, especially when the data sources and data clients are geographically dispersed.
Another approach is to provide multiple databases, for example as illustrated in FIG. 1B. According to such an approach, multiple databases 104a-b each receive data from multiple data sources 108a-b and provide requested data to multiple data clients 112a-b. By using multiple databases 104a-b that each receive identical sets of data from the multiple databases 104a-b, redundancy is provided. In addition, availability with respect to serving requests for data from data clients 112a-b is improved as compared to single database solutions. However, the use of multiple databases 104a-b requires complex application logic in order to resolve data redundancy issues. In addition, scalability is poor, particularly where there are large numbers of users.
Yet another approach, illustrated in FIG. 1C, is to provide multiple databases 104 for multiple data sources 108a-b, and to copy data from each of the databases 104a-b to a single, central database 116 that serves multiple data clients 112. This approach provides a centralized location from which a complete set of data may be accessed by data clients 112. In addition, this approach provides redundancy. To further increase availability, multiple central databases 116 can be used. However, such arrangements have poor scalability. For example, in order to ensure that each central database 116 has a complete and up to date set of data, large amounts of data must be transferred from each database 104 to each central database 116. Large data transfers increase latency between the data sources and the data clients, limiting the suitability of this arrangement for real time applications.
Conventional database structures have been incapable of providing both high availability and high scalability with low latency. Accordingly, a need remains for a database architecture that can be scaled to support many data sources and many data users, while at the same time providing good availability and allowing updates involving large volumes of data to be performed with low latency.