As computer networking and interconnection systems have steadily advanced in capabilities, reliability, and throughput, and as distributed computing systems based on networking and interconnection systems have correspondingly increased in size and capabilities, enormous progress has been made in developing theoretical understanding of distributed computing problems, in turn allowing for development and widespread dissemination of powerful and useful tools and approaches for distributing computing tasks within distributed systems. Early in the development of distributed systems, large mainframe computers and minicomputers, each with a multitude of peripheral devices, including mass-storage devices, were interconnected directly or through networks in order to distribute processing of large, computational tasks. As networking systems became more robust, capable, and economical, independent mass-storage devices, such as independent disk arrays, interconnected through one or more networks with remote host computers, were developed for storing large amounts of data shared by numerous computer systems, from mainframes to personal computers. Recently, as described below in greater detail, development efforts have begun to be directed towards distributing mass-storage systems across numerous mass-storage devices interconnected by one or more networks.
As mass-storage devices have evolved from peripheral devices separately attached to, and controlled by, a single computer system to independent devices shared by remote host computers, and finally to distributed systems composed of numerous, discrete, mass-storage units networked together, problems associated with sharing data and maintaining shared data in consistent and robust states have dramatically increased. Designers, developers, manufacturers, vendors, and, ultimately, users of distributed systems continue to recognize the need for extending already developed distributed-computing methods and routines, and for new methods and routines, that provide desired levels of data robustness and consistency in larger, more complex, and more highly distributed systems.
Recently, a new distributed data-storage system architecture, referred to as the “federated array of bricks” (“FAB”) architecture has been developed. The FAB architecture, is described, in detail, below. The FAB architecture presents new problems with regard to managing and recovering redundancy under various failure conditions and failure modes. Designers, developers, manufacturers, and vendors of mass-storage systems developed according to this new architecture have recognized a need for efficient methods for redundancy recovery upon failure of individual mass-storage devices within component data-storage systems of the distributed data-storage system.