A SAN is a dedicated high-speed network which interconnects and presents a shared pool of storage devices to multiple servers (hosts). The SAN can be used to enhance performance of the storage devices by making the shared pool of the storage devices appear as a single locally attached device to each of the multiple servers. A VSAN is a logical partition in the SAN. The VSAN can allow data traffic to be isolated within specific portions of the SAN so that the system can be scaled out and easily configured. A primary objective of the VSAN is management of subscribers, who can be added or relocated without a need for changing physical layout of the storage devices. Another objective of the VSAN can be to provide data redundancy which can minimize risk of data loss.
A conventional VSAN/SAN storage array (VSAN/SAN cluster) comprises a brick or a “BOX” as the basic building block. The VSAN/SAN cluster can comprise multiple bricks/BOXes. The brick/BOX of the VSAN/SAN cluster consists of two array controllers and one driver yard (DY) or disk array enclosure (DAE). The array controllers receive input/output (I/O) requests from a host and each of the array controllers comprises multi-core CPUs for processing the I/O tasks. The DY/DAE contains multiple solid state drives (SSD) for storing data and a few backup units (parity drives). In the conventional architecture of the VSAN/SAN cluster, if a failure is detected on the SSD in the DY/DAE of the brick/BOX, data from the failing SSD is backed up to an available parity drive in the DY/DAE of the brick/BOX.
There are several technical problems with such conventional systems, as explained below. First, for the case of a failure detection of the SSD in the DY/DAE of the brick/BOX of the VSAN/SAN cluster and all the parity drives in the DY/DAE of the brick/BOX already in use, there is no system or method for backing up data from the failing SSD to the available parity drive in a different brick/BOX of the VSAN/SAN cluster. There is no system or method that can help the brick/BOX to communicate with a different brick/BOX of the VSAN/SAN cluster and facilitate backup of the data. Additionally, in the conventional architecture of the VSAN/SAN cluster there is no provision to transfer data from the DAE/DY of one brick/BOX to the DAE/DY of another brick/BOX in order to maintain high data availability. As an example, data from the SSD can be permanently lost if the data from the SSD cannot be backed up before the SSD fails. Hence, there is a need for a novel method and system for improving fault tolerance in the VSAN/SAN.