Modern enterprises are investing significant resources to preserve and provide access to data. Data protection is a growing concern for businesses of all sizes. Users are looking for a solution that will help to verify that critical data elements are protected, and storage configuration can enable data integrity and provide a reliable and safe switch to redundant computing resources in case of an unexpected disaster or service disruption.
To accomplish this, storage systems may be designed as fault tolerant systems spreading data redundantly across a set of storage-nodes and enabling continuous operation when a hardware failure occurs. Fault tolerant data storage systems may store data across a plurality of disk drives and may include duplicate data, parity or other information that may be employed to reconstruct data if a drive fails. Data storage formats, such as RAID (Redundant Array of Independent Discs), may be employed to protect data from internal component failures by making copies of data and rebuilding lost or damaged data. As the likelihood for two concurrent failures increases with the growth of disk array sizes and increasing disk densities, data protection may be implemented, for example, with the RAID 6 data protection scheme well known in the art.
Common to all RAID 6 protection schemes is the use of two parity data portions per several data groups (e.g. using groups of four data portions plus two parity portions in a (4+2) protection scheme, using groups of sixteen data portions plus two parity portions in a (16+2) protection scheme, etc.), the two parities being typically calculated by two different methods. Under one well-known approach, all n consecutive data portions are gathered to form a RAID group, to which two parity portions are associated. The members of a group as well as their parity portions are typically stored in separate drives. Under a second approach, protection groups may be arranged as two-dimensional arrays, typically n*n, such that data portions in a given line or column of the array are stored in separate disk drives. In addition, to every row and to every column of the array a parity data portion may be associated. These parity portions are stored in such a way that the parity portion associated with a given column or row in the array resides in a disk drive where no other data portion of the same column or row also resides. Under both approaches, whenever data is written to a data portion in a group, the parity portions are also updated using well-known approaches (e.g. such as XOR or Reed-Solomon). Whenever a data portion in a group becomes unavailable, either because of disk drive general malfunction or because of a local problem affecting the portion alone, the data can still be recovered with the help of one parity portion, via well-known techniques. Then, if a second malfunction causes data unavailability in the same drive before the first problem was repaired, data can nevertheless be recovered using the second parity portion and related, well-known techniques.
While the RAID array may provide redundancy for the data, damage or failure of other components within the subsystem may render data storage and access unavailable.
Fault tolerant storage systems may be implemented in a grid architecture including modular storage arrays, a common virtualization layer enabling organization of the storage resources as a single logical pool available to users and a common management across all nodes. Multiple copies of data, or parity blocks, should exist across the nodes in the grid, creating redundant data access and availability in case of a component failure. Emerging Serial-Attached-SCSI (SAS) techniques are becoming more and more common in fault tolerant grid storage systems.
The problems of fault tolerant grid storage systems have been recognized in the Prior Art and various systems have been developed to provide a solution, for example:
US Patent Application No. 2009/094620 (Kalvitz et al.) discloses a storage system including two RAID controllers, each having two SAS initiators coupled to a zoning SAS expander. The expanders are linked by an inter-controller link and create a SAS ZPSDS. The expanders have PHY-to-zone mappings and zone permissions to create two distinct SAS domains such that one initiator of each RAID controller is in one domain and the other initiator is in the other domain. The disk drives are dual-ported, and each port of each drive is in a different domain. Each initiator can access every drive in the system, half directly through the local expander and half indirectly through the other RAID controller's expander via the inter-controller link. Thus, a RAID controller can continue to access a drive via the remote path in the remote domain if the drive becomes inaccessible via the local path in the local domain.
US Patent Application 2008/201602 (Agarval et al.) discloses a method and apparatus for transactional fault tolerance in a client-server system. In one example, output data generated by execution of a service on a primary server during a current epoch between a first checkpoint and a second checkpoint is buffered. A copy of an execution context of the primary server is established on a secondary server in response to the second checkpoint. The output data as buffered is released from the primary server in response to establishment of the copy of the execution context on the secondary server.
US Patent Application No. 2007/174517 (Robillard et al.) discloses a data storage system including first and second boards disposed in a chassis. The first board has disposed thereon a first Serial Attached Small Computer Systems Interface (SAS) expander, a first management controller (MC) in communication with the first SAS expander, and management resources accessible to the first MC. The second board has disposed thereon a second SAS expander and a second MC. The system also has a communications link between the first and second MCs. Primary access to the management resources is provided in a first path which is through the first SAS expander and the first MC, and secondary access to the first management resources is provided in a second path which is through the second SAS expander and the second MC.
US Patent Application 2006/010227 (Atluri et al.) discloses a system for providing secondary data storage and recovery services for one or more networked host nodes, and includes a server application for facilitating data backup and recovery services; a first data storage medium accessible to the server application; a second data storage medium accessible to the server application; at least one client application for mapping write locations allocated by the first data storage medium to write locations represented in a logical view of the first data storage medium; and at least one machine instruction enabling direct read capability of the first data storage medium by the server application for purposes of subsequent time-based storage of the read data into the secondary data storage medium.