A storage system may include one or more storage devices into which information may be entered, and from which information may be obtained. A storage operating system executed on the storage system may functionally organize the system by, e.g., invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer.
Storage systems commonly have a storage operating system, e.g., to respond to input/output requests and/or to perform housekeeping tasks associated with the storage systems on which they operate. The storage operating system of the storage system may implement a high-level module, e.g., a file system, to logically organize the information stored on the disks as a hierarchical structure of directories, files and/or blocks. One type of file system is a write-anywhere file system. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc., of Sunnyvale, Calif.
The storage system may be managed by a plurality of computing devices, referred to herein as “nodes.” In many conventional storage systems an entire hard disk or solid state drive (SSD) is the smallest unit of capacity that could be provisioned to a node. In many systems that do not share storage devices (e.g., hard disks or SSDs), a single “owning” node may generally handle device failures. In these systems, the process of failing a disk may involve several blocks, e.g.: detecting a device error and deciding the severity; preventing further I/O to a failing disk while processing the error; deciding if any other related disk has also failed; recording the error in various system logs; failing the disk within a redundant array of independent disks (RAID) arrangement thereby causing a sick disk copy (SDC), or reconstruction to start; and, for severe (e.g., persistent) errors: recording the error persistently in case the system power cycles and the disk comes back healthy; and lighting the fault LED, or other indicator of error, on a drive enclosure. The final steps of recording the error persistently and lighting the fault LED on the drive enclosure may be signals to the system administrator to replace the disk.
Thus, prior systems which manage a plurality of storage devices with a single master node risk catastrophic failure with the failure or unavailability of the master node. There exists a need for more efficient management of one or more storage devices. Particularly, there exists a need for a system facilitating storage device management redundancy while harmonizing behavior across the entire system.