Scalable real time databases are becoming pervasively used in different environments due to requirements originating from telecom and internet environments and due to the performance and scalability rates required by those environments. This trend is even stronger, taking into account that the entire telecommunications networks are being re-architected by standardization groups so that pre-existing telecom network databases are split into two layers, namely the one in charge of implementing real functionality and the one in charge of storing and managing data. These real time databases are expected to achieve a high capacity and scalability. The same trends are applicable to IT environments and late advancements in cloud computing. In the telecom environment the demand on high throughput and minimum latencies can only be achieved by means of in memory databases, meaning databases where data sets are kept in the working memory area.
As a result of previous trends, real time databases are to become pervasive, and as a result of performance and scalability properties required, they must be based on clusters, grids or other scalable architectures.
In any case, and in order to provide the necessary resilience features, several redundant instances are provided, both local and even multiple remote replicas—they are necessary to provide the necessary high availability and geographical redundancy properties. In practice, this can mean a number of four, six or even more replicas in order to widthstand local and remote failures of multiple units and entire sites.
Current solutions, such as those known from Kossmann, Kraska, Loesing: “An Evaluation of Alternative Architectures for Transaction Processing in the Cloud, ACM SIGMOD 10 Proceedings of the 2010 International Conference on Management of Date”, are based inter alia on first partitioning data and assigning those data partitions to N individual master databases and finally replicating those partitions, one by one, and in their entirety onto an additional set of N replica database servers, resulting in 2×N servers, 3×N, 4×N servers if resilience against 2, 3 or more simultaneous failures is required.
As a consequence, replica databases are essentially idle, attending only replication of data updates and waiting for a single failure—single and unknown—that will affect a single instance among them. Complementary approaches target specific niches and are complex to implement, none of them addressing the basic problem of the inefficiency, which is having dedicated master and replica databases devoted to the same and equal partitions (see Kossmann et al. mentioned above).
Maintaining such a large number of replicas (1×N, 2×N and so on) implies investment, with operational and general infrastructure costs growing linearly, up to half, two thirds or a higher fraction of computing resources, and increasing when resilience against one to or a higher number of simultaneous failures is needed.
Devoting redundant resources to secondary tasks does not help solving the problem, as the system has to be dimensioned for big traffic even in a faulty state. Even worse, extra capacity is needed for helping in recovery tasks in the partitions replicas, e.g. synchronisation and replica consistency assurance when a failed server is recovering with simultaneous incoming traffic.
Summarizing, the bigger the system, the more partitions are required and the more reliable the system is required to be, the more replicas are required for each partition. All replicas are provided in an idle state waiting for a failure affecting a single and unknown replica and shall be dimensioned for extra tasks during recovery. All these inefficiencies increase quadratically with factors such as simultaneous scalability, number of partition and number of replicas.
Summarizing, telecommunication database systems are based on working memory area storage in order to comply with latency and throughput requirements. The above mentioned idle resources imply very high costs and sub-optimal capacity.