The need to store digital files, documents, pictures, images and other data continues to increase rapidly. As a result, the demand for data storage continues to increase. In addition, recent legislation, such as the Sarbanes-Oxley Act in the United States, affecting the management of electronic records, has increased the need for data storage. As the demand for data storage has increased, the space or volume occupied by storage systems has become an important issue. In particular, data storage having high capacity, high density, and space efficiency has become increasingly desirable.
In order to provide increased storage space, storage devices with ever greater storage capacities are being developed. However, the storage needs of even small enterprises can easily exceed the storage capacity of a single data storage device. In addition, in order to safeguard data, systems providing data redundancy that include multiple storage devices are necessary.
Systems that provide at least some integration of individual storage devices, such as JBOD (just a bunch of disks), SBOD (switched bunch of disks) or RAID (redundant array of independent disks) systems have been developed. Such systems are typically deployed within enclosures to present an integrated component to the user. In order to facilitate serviceability and packaging, such systems may include sleds or carriers to which a number of storage devices are mounted. The sled may also provide interconnections to allow the attached storage devices to be operatively connected to a controller and/or a data bus. By providing storage devices attached to sleds, the removal and insertion of storage devices into a system enclosure can be facilitated. Accordingly, each sled and its associated storage devices can comprise an individual field replaceable unit (FRU) within a data storage system.
When there is a storage device failure within the data storage system, the FRU is replaced and returned to the manufacturer for servicing. Where there are a number of storage devices included in an FRU that have been removed due to a failure, all of the storage devices are usually treated as faulty, even though there may be only one storage device that is in fact faulty. As a result, fully operational storage devices may be permanently removed from service. The percentage of waste that results from this practice can be calculated by the number of good drives on a sled divided by the number of bad drives. For example, if there are two drives on a sled, there is a 100% waste if only one drive on that sled is actually bad. As a further example, if there are four drives on a sled and one drive is bad, there is a 300% waste.
If all of the storage devices are not treated as defective, full failure analysis must be performed for each drive on the sled. This task can consume large amounts of manpower and other resources. Furthermore, even if a full analysis is conducted, it cannot guarantee that the faulty drive will actually be detected. For example, failures can be transient, related to the data storage system with which the data storage device was associated, and/or related to the particular environment in which the data storage system operates. As a result, the practice of discarding all drives on a sled in response to detecting a failure associated with one of the interconnected drives has continued.