1. The Field of the Invention
The present invention generally relates a distributed data storage system. Typically, such distributed storage systems are targeted at storing large amounts of data, such as objects or files in a distributed and fault tolerant manner with a predetermined level of redundancy. The present invention relates more particularly to a distributed object storage system.
2. The Related Technology
The advantages of object storage systems, which store data objects referenced by an object identifier versus file systems, such as for example US2002/0078244, which store files referenced by an inode or block based systems which store data blocks referenced by a block address in terms of scalability and flexibility are well known. Object storage systems in this way are able to surpass the maximum limits for storage capacity of file systems in a flexible way such that for example storage capacity can be added or removed in function of the needs, without degrading its performance as the system grows. This makes such object storage systems excellent candidates for large scale storage systems.
Such large scale storage systems are required to distribute the stored data objects in the object storage system over multiple storage elements, such as for example hard disks, or multiple components such as storage nodes comprising a plurality of such storage elements. However as the number of storage elements in such a distributed object storage system increase, equally the probability of failure of one or more of these storage elements increases. To cope therewith it is required to introduce a level of redundancy into the distributed object storage system. This means that the distributed object storage system must be able to cope with a failure of one or more storage elements without data loss. In its simplest form redundancy is achieved by replication, this means storing multiple copies of a data object on multiple storage elements of the distributed object storage system. In this way when one of the storage elements storing a copy of the data object fails, this data object can still be recovered from another storage element holding a copy. Several schemes for replication are known in the art. In general replication is costly as the storage capacity is concerned. This means that in order to survive two concurrent failures of a storage element of a distributed object storage system, at least two replica copies for each data object are required, which results in storage capacity overhead of 200%, which means that for storing 1 GB of data objects a storage capacity of 3 GB is required. Another well-known scheme is referred to as RAID systems of which some implementations are more efficient than replication as storage capacity overhead is concerned. However, often RAID systems require a form of synchronisation of the different storage elements and require them to be of the same type and in the case of drive failure require immediate replacement, followed by a costly and time consuming rebuild process. Therefor known systems based on replication or known RAID systems are generally not configured to survive more than two concurrent storage element failures. Therefor it has been proposed to use distributed object storage systems that are based on erasure encoding, such as for example described in WO2009135630 or US2007/0136525. Such a distributed object storage system stores the data object in encoded sub blocks that are spread amongst the storage elements in such a way that for example a concurrent failure of six storage elements out of minimum of sixteen storage elements can be tolerated with a corresponding storage overhead of 60%, that means that 1 GB of data objects only require a storage capacity of 1.6 GB.
Current erasure encoding based distributed object storage system for large scale data storage are well equipped to efficiently store and retrieve large data objects, however when small data objects need to be stored or retrieved, the latency generated by the encoding technology can become too large, especially if small data objects need to be stored in large quantities. The same holds for large data objects when only a smaller section of the data needs to be retrieved, because in traditional distributed object storage systems the data object can only be retrieved in its entirety and retrieval can only start after the data object was entirely stored.
Therefore, there still exists a need for a simple configuration facility that is able to cope with small data objects and data fragments of large data objects in a more efficient manner.