Object storage systems are generally known. These systems store data objects referenced by an object identifier versus file systems. One example of such is the system described by US2002/0078244, which stores files referenced by an inode, or block-based systems which store data blocks referenced by a block address in terms of scalability and flexibility. This can generally allow object storage systems to surpass the maximum limits for storage capacity of file systems in a flexible way such that for example storage capacity can be added or removed in function of the needs, while reducing degradation in performance as the system grows. As a result, object storage systems are often selected for large-scale storage systems.
Such large-scale storage systems generally distribute the stored data objects in the object storage system over multiple storage elements, such as for example hard disks, or multiple components such as storage nodes comprising a plurality of such storage elements. However, as the number of storage elements in such a distributed object storage system increase, equally the probability of failure of one or more of these storage elements increases. To cope with this issue, distributed object storage system generally use some level of redundancy, which allows the system to cope with a failure of one or more storage elements without data loss.
In its simplest form, redundancy at the object store level is achieved by replication, which means storing multiple copies of a data object on multiple storage elements of the distributed object storage system. When one of the storage elements storing a copy of the data object fails, this data object can still be recovered from another storage element holding a copy. However, replication can be costly in terms of system cost and overhead. For instance, in order to survive two concurrent failures of a storage element of a distributed object storage system, at least two replica copies for each data object are required, which results in storage capacity overhead of 200% (e.g., storing 1 gigabyte (GB) of data objects requires 3 GB of storage capacity).
One existing redundancy scheme utilizes redundant array of independent disks (RAID) systems. While some RAID implementations are more efficient than replication as storage capacity overhead is concerned, they are generally inflexible. For example, these RAID implementations often require a specific form of synchronization of the different storage elements and require the storage elements to be of the same type. Additionally, in the case of drive failure, these implementations require immediate replacement, followed by a costly and time consuming rebuild process.
In some object storage systems, data object replication is generally handled by an internal process in response to user and system actions. Replication processes are designed to comply with a defined quality of service in terms of maximum time boundary to do the replication, based on round trip time between the original site and replication site and bandwidth availability.
In object storage systems that comply with Simple Storage Services (S3) protocols, the replication solution is usually comprised of multiple GET-PUT processes, ideally run in parallel. They are targeted to complete replication of the batch of objects within a certain amount of time, generally well under the maximum time boundary defined by the quality of service. However, because the systems rely on fixed resources and managed parallelism, it may be a challenge to meet the target timeline. There are typically tradeoffs between allocation of system resources and the amount of time the system takes to complete each object replication task. The functionality of object storage systems may be improved by reducing the amount of time required for replication and maximizing the efficient use of available system resources.
A specific challenge for managing parallel processing of multiple GET-PUT processes occurs when a batch of objects for replication include objects of varying size. To partially address this, many systems include logic for handling larger data objects that includes dividing them into multiple parts, where each part is within a certain size range acceptable to the system. The parts of these multipart data objects can then be allocated across parallel replication processes. Data objects that already fall within the size range acceptable to the system are handled as single part data objects. It should be understood that even though this process reduces the range of part sizes being replicated, it is still a range and each part may be a different size. Even the parts within a multipart data object may have varying part sizes. In many systems, once a batch of objects is identified for replication, they are allocated among the parallel replication processes in that system in first in first out (FIFO) order out of the batch of object identifiers. However, this approach is unlikely to yield the best overall completion time for the batch.
Therefore, there still exists a need for improved parallel replication of batches of data objects of different sizes to improve replication time and/or use of available object storage system resources.