The present invention relates to digital data storage systems and methods, and more particularly to those providing fault-tolerant storage.
It is known in the prior art to provide redundant disk storage in a pattern according to any one of various RAID (Redundant Array of Independent Disks) protocols. Typically disk arrays using a RAID pattern are complex structures that require management by experienced information technologists. Moreover in many array designs using a RAID pattern, if the disk drives in the array are of non-uniform capacities, the design may be unable to use any capacity on the drive that exceeds the capacity of the smallest drive in the array.
One problem with a standard RAID system is that it is possible for disc-surface corruption to occur on an infrequently used area of the disk array. In the event that another drive fails, it is not always possible to determine that corruption has occurred. In such a case, the corrupted data may be propagated and preserved when the RAID array rebuilds the failed drive.
In many storage systems, a spare storage device will be maintained in a ready state so that it can be used in the event another storage device fails. Such a spare storage device is often referred to as a “hot spare.” The hot spare is not used to store data during normal operation of the storage system. When an active storage device fails, the failed storage device is logically replaced by the hot spare, and data is moved or otherwise recreated onto the hot spare. When the failed storage device is repaired or replaced, the data is typically moved or otherwise recreated onto the (re-)activated storage device, and the hot spare is brought offline so that it is ready to be used in the event of another failure. Maintenance of a hot spare disk is generally complex, and so is generally handled by a skilled administrator. A hot spare disk also represents an added expense.
Generally speaking, when the host filesystem writes a block of data to the storage system, the storage system allocates a storage block for the data and updates its data structures to indicate that the storage block is in use. From that point on, the storage system considers the storage block to be in use, even if the host filesystem subsequently ceases to use its block.
The host filesystem generally uses a bitmap to track its used disk blocks. Shortly after volume creation, the bitmap will generally indicate that most blocks are free, typically by having all bits clear. As the filesystem is used, the host filesystem will allocate blocks solely through use of its free block bitmap.
When the host filesystem releases some blocks back to its free pool, it simply clears the corresponding bits in its free block bitmap. On the storage system, this is manifested as a write to a cluster that happens to contain part of the host's free block bitmap, and possibly a write to a journal file; almost certainly no input/output (I/O) to the actual cluster being freed itself. If the host filesystem were running in an enhanced security mode, there might be I/O to the freed block due to overwriting of the current on-disk data by the host so as to reduce the chance of the stale cluster contents being readable by an attacker, but there is no way to identify such writes as being part of a deletion process. Thus, the storage device has no way to distinguish a block that the host filesystem has in use from one that it previously used and has subsequently marked free.
This inability of the storage system to identify freed blocks can lead to a number of negative consequences. For example, the storage system could significantly over-report the amount of storage being used and could prematurely run out of storage space.