Distributed databases solve the scalability problem of storing a vast volume of data when the storage capacity of a single database does not suffice. In this regard, a distributed database is a database in which the data is not stored at a single physical location. Instead, it is spread across a network of database nodes that are geographically dispersed and connected via communications links.
FIG. 1 illustrates schematically an example of a distributed database system providing data access to database clients. The distributed database is comprised of a number of database (DB) nodes, and a number of client interface nodes with which database clients interact in order to access the data stored in the distributed database. Each database node has a storage unit for storing data objects, and an interface to one or more of the client interface nodes. The client interface nodes also have an interface with one or more database clients of the distributed database via which they receive requests for database transactions, wherein each database transaction requires one or more data operations (also known as database queries) be performed on data objects that are stored or that are to be stored in the distributed database.
In order to determine which database node of the distributed database stores or is to store a data object to which a data operation relates, each client interface node can be provided with distribution logic that determines which database node should store a data object. Moreover, each client interface node that receives and processes requests for database transactions will typically be required to implement transaction manager functionality that applies the “ACID” properties to a database transaction so as to ensure the proper execution of that database transaction. These “ACID” properties are atomicity, consistency, isolation, and durability.
As described above a database transaction requested by a database client can require more than one data operation. Consequently, it is often the case that such a database transaction will require data operations be performed on a plurality of data objects, where these data objects are distributed between a number of the database nodes. Such a database transaction is referred to as a distributed database transaction, and will require that a client interface node communicate with more than one database node in order to execute/implement the distributed database transaction. For example, the dot-dash lines on FIG. 1 illustrate a distributed transaction that requires a first data operation be performed on data object x and a second data operation be performed on data object y, where data object x is stored at database node A and data object y is stored at database node B.
Current trends in the use of distributed databases suggest that distributed transactions will become more common. For example, in order to take advantage of the scalable storage capacity provided by a distributed database, there is a trend for services that are required to store and/or make use of large amounts of data to be implemented using a tiered or layered architecture, wherein an application layer implements the actual service provision and processing and a separate database layer is used to store the service data. The application layer is therefore comprised of application servers that act as database clients, whilst the database layer is comprised of the distributed database. An example of a service that can make use of such a tiered or layered architecture is that of a Home Location Register (HLR) or a Home Subscriber Server (HSS) of a telecommunications network, in which “dateless” front-end servers provide the service application logic using data stored in a back-end distributed database. Furthermore, this trend towards tiered or layered service architectures has enabled service provides to aggregate the service data of multiple services within a single distributed database, which can significantly increase the volume of data stored within the distributed database. This increase in the volume of data therefore requires that the distributed database has a greater number of database nodes, which in turn increase the likelihood that distributed database transactions will occur. In addition, the increased usage of mainstream, off-the-shelf hardware by service providers implies that the individual database nodes of a distributed database will have less storage capacity, such that the distributed database will require a greater number of database nodes in order to store the same volume of data, which also increases the likelihood that distributed database transactions will occur.
This increase in the occurrence of distributed database transactions has a negative impact on the performance of a distributed database, due to the increase in the database response times caused by the need for a client interface node to communicate with more than one database node when executing/implementing a database transaction. This is particularly true for databases that make use of the two-phase commit (2PC) protocol or three-phase commit (3PC) protocol to implement the database operations of a distributed database transaction, as these protocols are already relatively expensive in terms of performance and response times.
In scenarios in which a distributed database stores the data of a single service/application, or of a number of very similar services/applications, it may be possible to reduce the number of distributed database transactions that occur by manually configuring the distribution of data between the database nodes. For example, in the case of a HLR and/or HSS application, the data can be distributed among the plurality of database nodes that form the distributed database based on ranges of numerical user identifiers, such as the International Mobile Subscriber Identity (IMSI) or Mobile Subscriber Integrated Services Digital Network Number (MSISDN). However, such a solution requires experts in the operation of these services/applications to configure the distributed database with an optimised data distribution. Moreover, this kind of approach does not fit well in scenarios in which a distributed database stores the data of multiple applications, especially when there is no particular similarity between the applications, as it is then extremely difficult to determine an optimised data distribution for data that is accessed and/or manipulated by multiple applications.
Other attempts to reduce the number of distributed database transactions have entailed attempting to determine a set of minimum data objects that are involved in a database transaction and defining this as the smallest unit of logical data partitioning to be implemented within the database. The idea underlying these solutions is to define a logical scope of the data where serializability must be maintained. Nevertheless, the definitions required by this kind of approach are static and assumes a previous knowledge of the data used by the different applications, which can also vary (e.g. in different releases of an application). Furthermore, with these kinds of solutions, the distributed database needs to be configured according to static client/application related data, in order to distribute the data so as to maintain serializability in this logical data partitioning. However, this does not hold under dynamic, rapidly evolving network usage conditions, and even less so in telecommunications network convergence and data consolidation scenarios (e.g. scenarios where a common back-end data repository provided by a distributed database stores data of a plurality of users that are utilized by different kind of telecommunication nodes, such as HLRs, HSSs, PCRFs, etc). In these scenarios, new applications/services are routinely added as and when required (i.e. as clients of the distributed database).