Large amount of computer data is typically held in a distributed storage system. Distributed storage systems offer several advantages: the ability to add more storage capacity as user requirements increase, the reliability of data based on data redundancy, and the flexibility in servicing and replacing failed components. Distributed storage systems have been implemented in various forms such as Redundant Arrays of Independent Disks (RAID) systems. RAID systems are described, for example, in the paper entitled “A Case for Redundant Arrays of Inexpensive Disks (RAID)” by Patterson et al.
In the paper entitled “The TickerTAIP Parallel RAID Architecture,” Cao et al. describe a disk array (RAID) system that includes a number of worker nodes and original nodes. Each worker node has several disks connected through a bus. The original nodes provide connections to the computer clients. When a disk or a node fails, the system reconstructs lost data by using the partial redundancy provided in the system. The described method is specific to using only RAID storage systems. Furthermore, the method does not address the problem of a distributed storage system composed of a very large number of independent storage arrays. It further assumes that the communication interconnect is reliable and therefore does not deal with the wide range of problems arising from this unreliable interconnect.
U.S. Pat. No. 6,438,661 describes a method and system for restoring lost metadata in a cache memory. The metadata provides information on user data maintained in a storage device. The method determines whether metadata tracks in the cache were modified, indicates in a non-volatile memory that the metadata tracks were modified, and restores the metadata tracks. The data restoring includes accessing the data tracks associated with the metadata tracks, staging the accessed data tracks into cache, and processing the data tracks to restore the metadata tracks. The described method does not restore data that is lost due to a failed node in a distributed storage system.
U.S. Patent Application 20020062422 describes another method for restoring metadata in a storage system in which data flows are written into the system as segments. The method scans the metadata in each segment to identify the last segment written from each flow. It then restores the metadata using the metadata in the segments excluding the metadata for the identified last segments. The described method is not applicable to a distributed multi-node storage system and does not address the problem of restoring data after a node fails.
In the paper entitled “Petal: Distributed virtual disks”, Lee et al describe a distributed storage system composed of a collection of virtual disks. The described method keeps a copy of metadata on every node. This becomes a problem as all nodes need to be updated and involved for any change in metadata. Furthermore, it has limited runtime adaptation during failures.
In the paper entitled “Serverless Network File Systems”, Anderson et al. describe a serverless distributed file system. The described method is specifically for file systems. In addition, it requires active participation of clients for its correctness and data restoration, thereby relying on clients and requiring commitments from them.
Therefore, there remains a need for a distributed storage system and method for restoring data affected by the failure of a storage node in the system or of a disk in a storage node without the drawbacks described above.