1. Field of the Invention
This invention relates to computer systems and, more particularly, to data storage subsystems within computer systems.
2. Description of the Related Art
Computer systems frequently include data storage subsystems for storing data. In particular, computer systems that include multiple clients interconnected by a network increasingly share one or more data storage subsystems via a network. The data storage subsystems may include or be further coupled to storage consisting of one or more disk storage devices, tape drives, or other storage media. A computer system may also include one or more servers in which metadata describing the contents of the included storage devices is maintained.
Data storage subsystems may store data with some redundancy to allow for recovery from storage errors. There are a variety of techniques to store data redundantly, including erasure coding techniques such as Reed-Solomon encodings and RAID (Redundant Array of Independent Disks) using a variety of layouts, such as RAID-1, RAID-5, or RAID-6. These RAID layouts may be implemented within an object-based file system in which each independent storage device is treated as a disk. Each client device may convey data to the storage devices via a network. Unfortunately, some way of arbitrating write access requests from multiple clients may be needed to avoid introducing inconsistencies into the redundant data. One arbitration approach is to require each client to obtain a lock before accessing a storage location. However this approach requires that each client be responsible for and trusted to perform all of the functions involved in sequencing writes using the lock mechanism. For example, in the case of RAID-5 or RAID-6, these functions may include reading old data and old parity, computing new parity, logging the new data and new parity, and writing the new data and new parity to their respective storage locations that together constitute a part of or the whole of a row in the RAID layout. In addition, a client may be required to retrieve information from the Meta Data Server (MDS) for each write to an individual location in the RAID layout. The performance of these functions increases write latency and adds complexity and significant computational and storage overhead to each client.
In addition to the above considerations, data storage subsystems are designed to minimize the loss of data that may occur when one or more devices fail. Although RAID layouts are intended to provide high availability and fault tolerance, there may be periods of increased vulnerability to device failure during complex read or write operations if clients are responsible for maintaining the redundancy. Clients may not be trust worthy or have sufficient resources to handle errors caused by device failures in a data storage subsystem. Rather than burden the client with tasks needed to store data redundantly, including handling device failures, some object based file systems may assume that clients are not trusted and rely on individual object storage devices to cooperatively manage redundancy. However, even in such cooperative systems, there exists a need for device failures to be handled in a manner that allows for continuing read and write operations without loss of data and without burdening the system's clients. There exists a further need to be able to resynchronize a failed device when and if it recovers from the failure or fully synchronize a replacement device if a failed device does not recover soon enough without reducing the availability of storage.
In view of the above, an effective system and method for managing device failures in object based data storage subsystems that accounts for these issues are desired.