Regarding database technologies, the concept of distributed databases has been widely spread for some time in order to address scalability issues. A distributed database may be regarded as a plurality of databases physically or logically distributed, likely under control of a central database management system, and wherein storage devices are not all necessarily attached to a common CPU. Thus, the distributed database might be built up with multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. Generally speaking, the distribution of databases instances is not necessarily a consequence of data distribution itself but also for the purpose of data replication in order to obtain high available systems.
Where considering a database system distributed in different physical locations, one has to take into account the different nature of the applications allowed to access such database system in terms of its connections to particular database instances and respective distances, as well as in terms of the data distribution amongst said particular database instances. In this respect, and depending on particular models of data distribution to apply, one may distinguish between local applications, which are connected to a specific database instance having all required data and which do not required data from remote database instances, and global applications, which is connected to any database instances and which requires data from remote database instances.
Even where local and global applications concurrently coexist to access the distributed database system, and particularly where the local and global applications carry out communication functions between network nodes of a telecommunication network, the distributed database system is generally required to accomplish: a transparent distribution, so that the applications interact with the distributed database system as if it were one compact logical system; and transparent transactions, so that each transaction maintains a database integrity across the plurality of distributed databases.
The transparent distribution, where the plurality of databases is distributed in different locations, requires a similar performance for local applications requesting data from a closely located database and for global applications requesting data from a far away located database. This is achieved in a traditional distributed database system by the usage of memory caches in areas closely located with the requester applications. Each memory cache temporary saving data usable by the closely located applications.
On the other hand, where memory caches are provided in areas closely located with the requester applications, the integrity of the database system to be maintained by each transparent transaction, as one compact logical system, requires an updating of all the memory caches each time a transaction modifies data in any particular memory cache.
In other words, where a database system with a two-layer distribution is provided, that is, with a master database, which may be distributed in a number of database instances or just being a centralized instance, and with a plurality of slave databases acting as memory caches and provided in areas closely located with the requester applications, there is a need for a sort of cache management logic that takes care about managing data in the slave databases, managing consistency between cached data in the slave databases and master data in the master database, and managing consistency between different caches in different instances of the slave database.
Nowadays, different mechanisms are known to address these three previous issues. For instance, the issue of managing data to be cached in the slave database may be addressed by cache algorithms like a so-called “Least Recently Used”, a so-called “Least Frequently Used”, and the like; whereas managing consistency between cached data in the slave databases and master data in the master database, as well as between different caches in different instances of the slave database, may be addressed by cache coherence models. Regarding the cache coherence models, the most widely known are the so-called “directory-based”, “snooping” and “snarfing”. These tree models, where applied to the two-layer DB architecture presented above, have different consequences.
Regarding the directory-based cache coherence model, there is a directory entry for each data block to be cached which contains information about the caching state of the data block in the system, and the locations of the slave database caching said data block. By checking the state and the locations, one can determine which instances of the slave database need to be updated for an operation in order to maintain coherence.
Regarding the snooping cache coherence model, at each slave database location, there is a monitor that is aware about changes in data cached in other locations of the slave database. Where these changes take place, the monitor removes the cache data.
Regarding the snarfing cache coherence model, at each slave database location, there is a monitor that is aware of changes in the master database and the monitor updates the cached data where there is a change in the master database.
These three models, and corresponding implementation mechanisms, are very inefficient where the distributed database system is used in a Wide Area Network (hereinafter WAN) and is shared in a telecommunication system by a number of possibly different subscriber register front-ends, such as Home Subscriber Server (HSS) front-ends and Home Location Register (HLR) front-ends may be. In such scenario, the distributed database system is expected to provide almost real time responses as well as real time coherence whilst the slave database locations are geographically separated by long distances, however, the WAN delays adversely affect continuous replications and updates.