The advantages of object storage systems, which store data objects referenced by an object identifier versus file systems, such as for example US2002/0078244, which store files referenced by an inode or block based systems which store data blocks referenced by a block address in terms of scalability and flexibility are well known. Object storage systems in this way are able to surpass the maximum limits for storage capacity of file systems in a flexible way such that for example storage capacity can be added or removed in function of the needs, without degrading its performance as the system grows. This makes such object storage systems excellent candidates for large scale storage systems.
Such large-scale storage systems are required to distribute the stored data objects in the object storage system over multiple storage elements, such as for example hard disks, or multiple components such as storage nodes comprising a plurality of such storage elements. However, as the number of storage elements in such a distributed object storage system increase, equally the probability of failure of one or more of these storage elements increases. To cope therewith it is required to introduce a level of redundancy into the distributed object storage system. This means that the distributed object storage system must be able to cope with a failure of one or more storage elements without data loss. In its simplest form redundancy is achieved by replication, this means storing multiple copies of a data object on multiple storage elements of the distributed object storage system. In this way when one of the storage elements storing a copy of the data object fails, this data object can still be recovered from another storage element holding a copy. Several schemes for replication are known in the art, in general replication is costly as the storage capacity is concerned. This means that in order to survive two concurrent failures of a storage element of a distributed object storage system, at least two replica copies for each data object are required, which results in storage capacity overhead of 200%, which means that for storing 1 GB of data objects a storage capacity of 3 GB is required. Another well-known scheme is referred to as RAID systems of which some implementations are more efficient than replication as storage capacity overhead is concerned. However, often RAID systems require a form of synchronisation of the different storage elements and require them to be of the same type and in the case of drive failure require immediate replacement, followed by a costly and time consuming rebuild process. Therefor known systems based on replication or known RAID systems are generally not configured to survive more than two concurrent storage element failures. Therefor it has been proposed to use distributed object storage systems that are based on erasure encoding, such as for example described in WO20091356300 or US2007/0136525. Such a distributed object storage system stores the data object in encoded sub blocks that are spread amongst the storage elements in such a way that for example a concurrent failure of six storage elements can be tolerated with a corresponding storage overhead of 60%, that means that 1 GB of data objects only require a storage capacity of 1.6 GB.
In order to reduce power consumption and increase reliability of the distributed object storage system, some form of monitoring of the hardware is required. In prior art systems some central monitoring facility will periodically connect to the storage elements and request status information such as fan speeds, temperature, disk error rates etc. The central facility will then analyse all this data and try to determine if certain actions are to be taken like proactively replication of data of a storage element that is about to fail. However, for very large and distributed object storage systems this approach does not scale well and the time it would take to poll all of the storage elements would lead to a very low monitoring frequency.