The advantages of object storage systems, which store data objects referenced by an object identifier versus file systems, such as for example US2002/0078244, which store files referenced by an inode or block based systems which store data blocks referenced by a block address in terms of scalability and flexibility are well known. Object storage systems in this way are able to surpass the maximum limits for storage capacity of file systems in a flexible way such that for example storage capacity can be added or removed in function of the needs, without degrading its performance as the system grows. This makes such object storage systems excellent candidates for large scale storage systems.
Such large scale storage systems are required to distribute the stored data objects in the object storage system over multiple storage elements, such as for example hard disks, or multiple components such as storage nodes comprising a plurality of such storage elements. However as the number of storage elements in such a distributed object storage system increase, equally the probability of failure of one or more of these storage elements increases. To cope therewith it is required to introduce a level of redundancy into the distributed object storage system. This means that the distributed object storage system must be able to cope with a failure of one or more storage elements without data loss. In its simplest form redundancy is achieved by replication, this means storing multiple copies of a data object on multiple storage elements of the distributed object storage system. In this way when one of the storage elements storing a copy of the data object fails, this data object can still be recovered from another storage element holding a copy. Several schemes for replication are known in the art. In general replication is costly as the storage capacity is concerned. This means that in order to survive two concurrent failures of a storage element of a distributed object storage system, at least two replica copies for each data object are required, which results in storage capacity overhead of 200%, which means that for storing 1 GB of data objects a storage capacity of 3 GB is required. Another well-known scheme is referred to as RAID systems of which some implementations are more efficient than replication as storage capacity overhead is concerned. However, often RAID systems require a form of synchronisation of the different storage elements and require them to be of the same type and in the case of drive failure require immediate replacement, followed by a costly and time consuming rebuild process. Therefore known systems based on replication or known RAID systems are generally not configured to survive more than two concurrent storage element failures. Therefore it has been proposed to use distributed object storage systems that are based on erasure encoding, such as for example described in WO2009135630, US2007/0136525 or US2008/313241. Such a distributed object storage system stores the data object in encoded sub fragments that are spread amongst the storage elements in such a way that for example a concurrent failure of six storage elements out of minimum of sixteen storage elements can be tolerated with a corresponding storage overhead of 60%, that means that 1 GB of data objects only require a storage capacity of 1.6 GB.
Current erasure encoding based distributed object storage system for large scale data storage are well equipped to efficiently store and retrieve large data objects, however when small data objects need to be stored or retrieved, the storage cost of such systems starts to divert from the theoretical optimum, especially if small data objects need to be stored in large quantities.
Therefore there still exists a need for an improved distributed object storage system that is able to cope with small data objects in a more efficient manner.