Technical Field
The present disclosure relates to clustered storage systems and, more specifically, to storage of metadata relating to configuration of one or more Redundant Array of Independent Disks (RAID) groups (i.e., RAID-related metadata) within a clustered storage system.
Background Information
Traditionally, metadata related to one or more Redundant Array of Independent Disks (RAID) groups (i.e., RAID-related metadata) is stored as a RAID label on each storage device, e.g., hard disk drive (HDD) and/or solid state drive (SSD), of a storage system. The RAID label is typically organized as a topology tree structure that identifies, inter alia, a RAID group (i.e., a logical grouping within an aggregate operated cooperatively) to which the storage device belongs, as well as a generation count of the storage device. The RAID-related metadata is also typically replicated across all RAID labels of an aggregate (i.e., a collection of storage devices), such that if a storage device fails, all other storage devices of the RAID group are updated (i.e., modified) to indicate the failed storage device is no longer part of the RAID group. However, RAID label consistency problems may arise in a clustered storage system (“cluster”) when two or more storage systems (“nodes”) attempt to operate the storage devices, e.g., in high availability or failover redundancy environments.
For example, one problem that may arise from storing RAID labels on storage devices is when a first node of a cluster updates the RAID labels (e.g., increments the generation count) for the storage devices of a RAID group, while a second node of the cluster reads those RAID labels (e.g., during failover or boot-up). Here, the second node may detect that some of the storage devices in the RAID group have a generation count (i.e., incremented) that is different from the generation count (i.e., non-incremented) of the other storage devices. This inconsistency in RAID label “versions” (i.e., differing generation counts) may lead the second node to incorrectly designate the RAID group as degraded. This problem may arise because a notion of ownership of a storage device, e.g., by a node in a cluster, is difficult to consistently maintain across all devices of the aggregate, particularly when transferring ownership among nodes of the cluster in response to a node failure (i.e., another node assuming ownership of the storage device). Typically, only the node of the cluster that owns a storage device is allowed to read (and modify) the RAID label, e.g., to form a RAID group.
Another problem that arises from storing RAID labels on storage devices involves removing a failed storage device from a RAID group or, more generally, from an aggregate. This problem is particularly acute when the storage device goes offline and then subsequently comes back. That is, the storage device may temporarily malfunction and go offline, but subsequently may come back online as part of the aggregate and be reused (i.e., re-designated) as, e.g., a spare. It is problematic to keep track of the status (offline/online) of the storage device on its own RAID label as the storage device cannot be written when it is offline.
Alternatively, keeping track of the storage device's status in the RAID labels of other storage devices, e.g., updating the RAID labels on all of the other storage devices in the aggregate except for the failed device, may lead to the problem described above. The failed storage device may have a generation count (e.g., not-incremented) that is older than the generation count (e.g., incremented) of the updated RAID labels on all the other storage devices. According to a typical algorithm, storage devices having the same generation count are included in the aggregate (i.e., RAID group), whereas a storage device with a different (e.g., older) generation count is excluded from the aggregate. If the storage system crashes when the failed storage device is offline, that storage device would typically be excluded from the aggregate when the system is restored, even as a spare (i.e., when the failed device comes back online) because its generation count is inconsistent with those of the other storage devices.
A solution to the RAID label consistency problems may be to maintain status information of storage devices in an area, e.g., a registry, on the storage devices that is separate from the RAID labels. Yet this solution essentially requires maintenance of multiple configuration sources (i.e., the registry and the RAID labels) which is inefficient. Accordingly, there is a need to obtain RAID-related metadata for an aggregate and to assimilate changes to the aggregate (i.e., RAID group) in a cluster of nodes with failover redundancy, without relying on RAID labels on storage devices of the aggregate.