The present invention relates to a device and method for scalable block data storage using content addressing, and, more particularly, but not exclusively to such a device and method optimized for RAM data storage devices.
Storage systems in general, and block based storage systems specifically, are a key element in modern data centers and computing infrastructure. These systems are designed to store and retrieve large amounts of data, by providing data block address and data block content—for storing a block of data—and by providing a data block address for retrieval of the data block content that is stored at the specified address.
Storage solutions are typically partitioned into categories based on a use case and application within a computing infrastructure, and a key distinction exists between primary storage solutions and archiving storage solutions. Primary storage is typically used as the main storage pool for computing applications during application run-time. As such, the performance of primary storage systems is very often a key challenge and a major potential bottleneck in overall application performance, since storage and retrieval of data consumes time and delays the completion of application processing. Storage systems designed for archiving applications are much less sensitive to performance constraints, as they are not part of the run-time application processing.
In general computer systems grow over their lifetime and the data under management tends to grow over the system lifetime. Growth can be exponential, and in both primary and archiving storage systems, exponential capacity growth typical in modern computing environment presents a major challenge as it results in increased cost, space, and power consumption of the storage systems required to support ever increasing amounts of information.
Existing storage solutions, and especially primary storage solutions, rely on address-based mapping of data, as well as address-based functionality of the storage system's internal algorithms. This is only natural since the computing applications always rely on address-based mapping and identification of data they store and retrieve. However, a completely different scheme in which data, internally within the storage system, is mapped and managed based on its content instead of its address has many substantial advantages. For example, it improves storage capacity efficiency since any duplicate block data will only occupy actual capacity of a single instance of that block. As another example, it improves performance since duplicate block writes do not need to be executed internally in the storage system. Existing storage systems, either primary storage systems or archiving storage systems are incapable of supporting the combination of content based storage—with its numerous advantages—and ultra-high performance. This is a result of the fact that the implementation of content based storage scheme faces several challenges:
(a) intensive computational load which is not easily distributable or breakable into smaller tasks,
(b) an inherent need to break large blocks into smaller block sizes in order to achieve content addressing at fine granularity. This block fragmentation dramatically degrades the performance of existing storage solutions,
(c) inability to maintain sequential location of data blocks within the storage systems, since mapping is not address based any more, and such inability causes dramatic performance degradation with traditional spinning disk systems,
(d) the algorithmic and architectural difficulty in distributing the tasks associated with content based mapping over a large number of processing and storage elements while maintaining single content-addressing space over the full capacity range of the storage system.
A number of issues arise with respect to such devices, and it is necessary to consider such issues as performance, lifetime, resilience to failure of individual devices, overall speed of response and the like.
Such devices may be used in highly demanding circumstances where failure to process data correctly can be extremely serious, or where large scales are involved, and where the system has to be able to cope with sudden surges in demand.
One challenge is to avoid performance bottlenecks and allow performance scalability that is independent of user data access patterns.
A second challenge is to support inline, highly granular block level deduplication without degrading storage (read/write speed) performance. The result should be scalable in both capacity—which is deduplicated over the full capacity space—and performance.
A further challenge is to address flash-based SSD write/erase cycle limitations, in which the devices have a lifetime dependent on the number of write/erase cycles.