Data-storage systems have steadily evolved, over the past 50 years, from low-capacity and relatively slow devices directly interconnected with host computers to complex, extremely fast, and extremely high-capacity and high-bandwidth stand-alone data-storage systems that can be concurrently accessed over high-bandwidth communication systems by many different remote host computers. FIG. 1 illustrates one type of distributed computing environment in which stand-alone data-storage systems provide data storage and data retrieval to remote host computers. In FIG. 1, two host computers 102-103 access three different data-storage systems 106-108 via a high-bandwidth communications network 110. Each data-storage system, such as data-storage system 106, includes a processing component 112 that interfaces to the high-bandwidth communications network 110 and that also interfaces to an internal communications medium, such as a high-speed bus 114 that links the processing component 112 with individual storage devices 116-119. The processing component 112 of a data-storage system provides a data-storage interface to remote, host computers 102-103 comprising commands that the remote host computers can send to data-storage systems for execution. These commands allow host computers to read data stored within data-storage systems, to write data to data-storage systems, to inquire about the capacities and configurations of data-storage systems, and to configure data-storage systems. Similarly, individual storage devices 116-119 provide a data-storage interface to allow the processing component 112 of a data-storage system to read data from, to write data to, to inquire about the contents and configuration of, and to configure individual storage devices.
In many, currently available distributed computing systems, the small computer systems interface (“SCSI”) is employed both as a data-storage interface provided to remote host computers by data-storage systems, as well as the data-storage interface provided by individual storage devices to the processing component of a data-storage system. In certain of these systems, SCSI commands are embedded in a higher-level network protocol, such as the fibre channel, for exchange of commands and responses between host computers and data-storage systems over a high-bandwidth network. SCSI commands and responses are exchanged between the processing component of a data-storage device and individual data-storage devices via internal buses, such as SCSI bus 114, that interconnect the individual storage devices with the processing component. In general, although multiple remote host computers may concurrently access a particular data-storage system, such as data-storage system 106 in FIG. 1, commands from multiple, remote sources are funneled through a single processing component, such as processing component 112, which greatly simplifies handling of the many different concurrent-access issues that may arise.
Complex, multi-processor, stand-alone data-storage systems, such as high-end disk arrays, have more recently become commercially available. FIG. 2 is a block diagram of an exemplary complex, multi-processor data-storage system. The data-storage system 202 includes two different network controllers 204-205 interconnected with two different high-bandwidth network media 206-207, two different processors 208-209, both interconnected with both network controllers 204 and 205, and two different memories 210 and 211, at least one of which, 211, is shared by both processors 208 and 209. Both processors 208 and 209 are interconnected through multiple internal busses to a number of internal data-storage systems 214-219, each equivalent to the standalone data-storage systems discussed above with reference to FIG. 1. The complex, multi-processor data-storage system shown in FIG. 2 may be concurrently accessed over multiple high-bandwidth communications media by numerous remote host computers. In this complex data-storage system, there are a far greater number of concurrency and data distribution problems than in the simpler data-storage systems discussed above with reference to FIG. 1. For example, unlike in the simpler data-storage systems, the more complex data-storage system shown in FIG. 2 may coordinate concurrent and simultaneous processing of commands by the two different processors. However, techniques developed for parallel processing computer systems can be used to coordinate activities of multiple processors, and to share and coordinate access to common state information employed by the multiple processors to execute commands received from remote host computers. For example, shared state information and shared command queues may be stored in the shared memory 211, with access by the multiple processors to the shared state information and shared command queues coordinated by hardware semaphores and various semaphore-based access-control techniques, locking techniques, and other techniques developed to handle problems arising from contention for shared resources by multiple processing entities. Thus, even in the complex, multi-processor data-storage device of FIG. 2, a commonly shared memory or other shared components may serve as a kind of funnel through which concurrent and simultaneous execution of commands can be funneled, providing a means for simplifying issues arising from contention for, and sharing of, state information and for synchronizing simultaneous task execution.
As the needs for ever greater storage capacities, higher bandwidths, and increased fault tolerance continue to grow, driven by ever increasing processor and networking capabilities and by expanding capabilities of, and demands on, computer applications, new strategies for designing, constructing, and managing complex, distributed, highly parallel data-storage systems have emerged. A particularly attractive strategy and design for high-end data storage systems involves distributing a data-storage system over many separate, intercommunicating data-storage systems, or nodes. FIG. 3 illustrates one example of a distributed data-storage system. In FIG. 3, three different data-storage systems 302-304, such as the data-storage systems discussed above with reference to FIG. 2, are interconnected with one another by one or more interconnections to two different high-bandwidth interconnection media 306 and 308. Additional data-storage systems 310-313 are interconnected with two of the previously mentioned data-storage systems 302 and 304 via three additional interconnection media 314-316. Data-storage systems 310 and 311 are interconnected with each other, and with data-storage system 302, through a single interconnection medium 314, while data-storage system 302 is directly interconnected with data-storage systems 310-313 through multiple interconnection media 314 and 315, and is interconnected with data-storage systems 303 and 304 through one or both of the high-bandwidth interconnection media 306 and 308. All seven data-storage systems 302-304 and 310-313 together form a single distributed data-storage system 318 that provides a network-addressable, uniform, cohesive, and well-behaved command-based data-storage interface to a number of remote host computers that intercommunicate with the distributed data-storage system 318 via one or both of the high-end intercommunication networks 306 and 308.
In many cases, the data-storage interface provided by a distributed data-storage system, such as distributed data-storage system 318 in FIG. 3, needs to appear and behave identically to a data-storage interface provided by conventional, non-distributed data-storage systems such as those described with reference to FIGS. 1 and 2, to avoid changes to applications and operating systems of remote host computers that access the distributed data-storage system. In the case of a distributed data-storage system, many profound issues with respect to concurrent and simultaneous processing of commands by the separate, component data-storage systems that together compose the distributed data-storage system are generally encountered. For example, state information that describes the current state of the distributed data-storage system may be accessed by all or a large fraction of the component data-storage systems. However, the state information may also be updated during command processing, with each update generally carried out by one of the component data-storage systems.
If only a single, central copy of the state information is maintained within the distributed data-storage system, then all but one of the component data-storage systems employ network communications in order to access the state information. Because some portion of state information may be accessed for all or a large subset of the different types of commands executed by data-storage systems, a single, central copy of the state information may lead to extremely high communications overheads and to unacceptable latency in command execution, as well as to serious single-point failures that can defeat high-availability operation of the distributed data-storage system. If, by contrast, the state information is replicated and distributed among the component data-storage systems, then great care needs to be taken to update all of the replicated copies of the state information when any single copy in the state information is updated by local processing of a command on one of the component data-storage systems. Update propagation is non-trivial, and may lead to high communications overheads and large command-processing latencies. Many other problems abound in complex, distributed computing systems, such as distributed data-storage systems. For this reason, designers, manufacturers, retailers, and users of distributed data-storage systems, and other distributed computing systems, have recognized the need for distributed computing systems and distributed computing systems designs that address distributed state information problems without introducing unacceptable overheads and performance degradation.