The present invention relates to a method or apparatus for data management and, more particularly, but not exclusively to such a method or apparatus that uses a distributed architecture.
Distributed Data Repository
Most mission critical data repositories are built as distributed systems that run on several computing servers inter-connected by a data network—i.e. Distributed Data Repositories. Examples for distributed data repositories are: File Systems, Directories and Databases. Mission-critical data repositories are built as distributed systems mainly to provide high availability and high scalability.
1. High availability is provided such that, no single computing server failure compromises the availability of the data repository as a whole including each and every data element.
2. High scalability is provided in two different dimensions: (1) amount of data, and (2) read/write transaction rate (throughput). In either case, a distributed data repository is “Highly Scalable” if more computing servers can be added to the system to support more amounts of data and/or higher transaction rate. Scalability of mission critical distributed data repositories will also require “Online Scalability”, which means the system can scale while continuing to provide data management services.
Real-Time Event Processing
When distributed data repositories serve data applications that perform real-time event processing, then distributed data repositories are also expected to support high responsiveness.
3. High Responsiveness is provided for real time data repositories such that each read and each write transaction is guaranteed with very high probability to be completed within a pre-defined amount of time. In real-time data repositories, the high availability and the online scalability requirements are also expected to preserve the continuous high responsiveness of the system during failures and during scalability events.
Examples of real-time event processing data applications are: Telecom call-control, Mobile telephony Home Location Registrar (HLR), Internet Multimedia System's (IMS) Home Subscriber Server (HSS), Online Banking and Trading Systems.
Mission critical real-time data repository systems are expected to be highly available, highly scalable and highly responsive. Supporting the combination of these requirements is very challenging. The responsiveness requirement may suggest allocating and devoting a dedicated computing resource for a transaction to make sure it is completed within the required amount of time. This strategy makes typical pipeline and timesharing processing scheduling less effective for accelerating the transaction rate, as responsiveness may be adversely affected.
The high availability requirement, on the other side, would typically suggest storing every mission critical data item on highly available storage device (e.g. RAID—Redundant Array of Independent Disks), which means that every write transaction needs to be written into the disk before it is committed and completed. Otherwise, the data will not be available in case the writing computing element has failed. This strategy reduces the transaction rate achieved even when running on large computing servers with many CPUs (SMPs—Symmetrical Multi-Processing).
In many cases, mission critical data repositories are accessed by several different computing entities (“clients”) simultaneously for read/write transactions and therefore distributed data repositories also need to provide system-wide consistency. A data repository is considered to be “consistent” (or “sequential consistent”), if from the point of view of each and every client, the sequence of changes in each data element value is the same.
In most implementations of distributed data repositories supporting many concurrent clients that perform write transactions, the consistency requirement is also a limiting factor for system scalability in terms of transaction rate. This is because write transactions need to be serialized and read transaction typically have to be delayed until pending write transactions have been completed. The serialization of read/write transactions is typically done also when different transactions access different data elements (i.e. independent transactions), due to the way the data is organized within the system (e.g. on the same disk, in the same memory, etc.).
“Shared All” Distributed Cache Coherency Architectures
Traditional distributed data repositories (such as Oracle Real Application Clustering and others) use highly available storage (typically using RAID technology) to store mission critical data while maintaining coherent local caches of in memory copies of data. This “shared all” distributed cache coherency architecture is capable of providing flexible active-active N+M high availability such that all computing nodes can be utilized to share all the data processing load. In case of one or more node failures the surviving nodes can be utilized to take over the data processing handled by the failed nodes.
The “shared all” distributed cache coherency architecture is illustrated in FIG. 1. The architecture is capable of providing scalability of read transaction rate, i.e. adding more nodes to the system can increase the read transaction rate. However, “shared all” distributed cache coherency architectures typically suffer from no or even negative write transaction rate scalability due to the need to coordinate each write between all local caches. Therefore, as more nodes are added to the system, it takes longer to commit and complete each write transaction, such that cache coherency is maintained between all local caches. This growing delay in committing and completing a write transaction makes the “shared all” distributed cache coherency architecture unsuitable for supporting applications that require real-time transaction processing when a big portion of the transactions are write transactions. The responsiveness requirements cannot be met when there is a large write transaction rate, and this becomes problematic because it is common that real-time event processing applications mentioned above have high write transactions rates. Therefore, the “shared all” distributed cache coherency architecture is not suitable for such applications when deployed in large scale.
“Shared Nothing” Data Partitioning Architecture
Other distributed data repositories (such as IBM DB2 UDB and MySQL) use a “shared nothing” data partitioning architecture such as that illustrated in FIG. 2. In the shared nothing architecture, a distributed data repository system is partitioned to several independent distributed data repository sub-systems, and each manages a different part of the data. In the “shared nothing” data partitioning architecture, each partition can be viewed as a “shared all” distributed cache coherency sub-system, each with its own highly available storage.
The “shared nothing” data partitioning architecture overcomes the write rate scalability problem, since the more independent partitions the system has, the more independent writes transactions can be performed concurrently to different partitions in a non-blocking way. Therefore, the write commit responsiveness can also be well addressed by such an architecture.
The key for the “shared nothing” data partitioning architecture is that the computing resource partitioning is tightly coupled to the data partitioning. This means that, computing resources are statically assigned to each data partition. When the system write rate grows, then the only way to scale the system up is to re-partition the system to more partitions and allocate more computing resources to the new partitions. This scaling process would typically require re-distributing data between partitions and cannot be done without harming the system's ability to continue providing highly responsive online database service. Therefore, re-partitioning would typically require planned down-time of the whole system. Therefore, online scalability cannot be achieved in a “shared nothing” data partitioning architecture. Moreover, to fully utilize the potential concurrency of the “shared nothing” architecture, the client application would typically need to be aware of the way data is being partitioned, which means that repartitioning events may also require changes in the client application itself. This makes the “shared nothing” architecture very expensive to manage and maintain.
“In-Memory” Data Repository Architecture
Other data repository architectures have emerged to focus on reducing transaction latency to provide better responsiveness and also to provide overall better transaction rate. This is done by keeping all the data in one sufficiently big memory of a single machine and by performing all database operations directly inside this memory. The latency of accessing the computer working memory may be orders of magnitude shorter than accessing storage devices such as disks. Therefore, by managing all data in memory, the data repository gains much shorter transaction latency and, therefore, also higher transaction rate.
Mission critical in-memory data repositories have typically duplicated the system to two or more identical instances, such that in-memory data is continuously synchronized between the duplicate instances via the local network (as in the cache coherency mechanisms of the “shared all” architecture). Network based data commit increases the latency of completing write transactions and therefore, also decreases the write transaction rate. However, network based data synchronization enables fault tolerance.
As a variation of the above it is possible to provide two or more data repositories for redundancy, with updating between the repositories.
Reference is made to FIG. 3, which illustrates an in-memory repository with fault tolerance.
In-memory data repositories cannot scale beyond the capacity and the write transaction rate provided by a single computing server. The only way to scale the capacity and/or the write transaction rate of an “in-memory” data repository system is to add more memory to the computing system, or in the case that the memory capacity of the computing system is maxed out, move the system to a larger computing system with more memory and more CPUs (i.e. larger SMP server). Both scalability strategies would require planned down-time of the system and therefore not comply with the high-availability and online-scalability requirements. Neither capacity nor write transaction rate can be scaled just by adding more computing server.
In-memory data repositories typically require co-location of the application and the database to achieve maximum performance and low latency. This raises the actual cost of the database, as they are typically priced per CPU, but the CPU is not dedicated only to the database, but also to the application. Therefore, a real application that consumes CPU and memory resources significantly reduces the actual price performance of the database. On the other hand, separating the in-memory database and the application to separate boxes, which allows maximum utilization of the money spent on the database, often reduces the performance gained by using an in-memory database to begin with.
There is thus a widely recognized need for, and it would be highly advantageous to have, a system that combines the advantages and avoids the disadvantages of the above described systems.