An independent storage system is a computer in which data is spread or “stripped” across multiple disk drives. In many implementations data is stored along with parity or redundant information such that any data lost as a result of disk failures can be automatically reconstructed. Independent storage systems, or storage nodes, are self-contained, and if required to connect multiple storage systems for adding capacity and/or throughput, this connection is accomplished through the network to form what is called a storage cluster. There are many implementations and methods of storage clusters which all consist of complex management software to distribute storage across all storage nodes. In theory, by adding storage nodes, the throughput should increase, but because of network overhead and adding sophisticated block or file management, the performance does not increase linearly and saturates after adding a fairly small number of storage nodes because of creation of new bottlenecks that traditional clusters impose. When storage nodes are created, the need of a meta-data controller to re-direct the client request to the storage node that requested data is contained.
To illustrate several of the motivations behind the present invention, a prevalent prior art architecture used within existing storage systems will be described. In a first prior art architecture, the storage cluster includes multiple array controller cards (“array controllers”) that couple an array of disk drives to a local main memory and micro-processor through a local Peripheral Component Interconnect Express (PCIe) bus. The array controllers plug into the PCIe expansion slot of the storage computer and communicate with the micro-processor and the main memory via the local PCIe bus. A network interconnect card (NIC) is used to connect client computers to the storage system. The NIC can have one or more ports that are able to connect to a standard network and behaves as a storage target for the client computers. Data is transferred from the network through the NIC and to the main memory. The data, now residing in the main memory, is then transferred to one or more of the array controllers through the local PCIe bus.
In a second prior art architecture, a front-end network switch provides a means of communication between the client computers communicate and the storage cluster. The storage cluster requires the addition of meta-data controllers that reside on storage nodes within the storage cluster in order to re-direct appropriate data requests from the client computers to the node that contains the requested data. This management adds a considerable overhead, because not only does it create network chatting between nodes to coordinate, but there is also the possibility of creating usage bottlenecks if multiple clients request data that resides in the same storage node.
A third prior art architecture, is a recent development that has adopted the use of PCIe to PCIe bridges to access common storage from several storage systems. However, this solution does not constitute a cluster, as the storage systems do not work as a global storage pool but instead work as independent storage systems that share common storage resources with limited expansion capabilities and performance.
Therefore it is the object of the present invention to provide a storage cluster comprising a unified data bus and a plurality of storage systems; the plurality of storage systems being interconnected by the unified data bus, allowing client computers to transfer data through a much faster pipe. By extending the bus architecture across the plurality of storage systems, the processor of each of the storage systems becomes an element of a global system that can use the resources of the other storage systems without having to do multiple data transfers. By connecting the storage systems to the unified data bus, management software is drastically simplified avoiding many issues, such as ownership and other special circumstances such as moving data and over-spillage from one storage system to the next. Furthermore, to avoid multiple data transfers, a multi-target remote direct memory access (RDMA) transfer can accomplish, n, number of data copies without any extra overhead by sending the data to all storage systems simultaneously in one transfer.