1. Field of the Invention
This invention relates to data storage systems and, more particularly, to storage array interconnection topology.
2. Description of the Related Art
Computer systems are placing an ever-increasing demand on data storage systems. In many of the data storage systems in use today, data storage arrays are used. The interconnection solutions for many large storage arrays are based on bus architectures such as, for example, small computer system interconnect (SCSI) or fibre channel (FC). In these architectures, multiple storage devices such as disks, may share a single set of wires, or a loop in the case of FC, for data transfers.
Such architectures may be limited in terms of performance and fault tolerance. Since all the devices share a common set of wires, only one data transfer may take place at any given time, regardless of whether or not all the devices have data ready for transfer. Also, if a storage device fails, it may be possible for that device to render the remaining devices inaccessible by corrupting the bus. Additionally, in systems that use a single controller on each bus, a controller failure may leave all the devices on its bus inaccessible.
There are several existing solutions available, which are briefly described below. One solution is to divide the devices into multiple subsets utilizing multiple independent buses for added performance. Another solution suggests connecting dual buses and controllers to each device to provide path fail-over capability, as in a dual loop FC architecture. An additional solution may have multiple controllers connected to each bus, thus providing a controller fail-over mechanism.
In a large storage array, component failures may be expected to be fairly frequent. Because of the higher number of components in a system, the probability that a component will fail at any given time is higher, and accordingly, the mean time between failures (MTBF) for the system is lower. However, the above conventional solutions may not be adequate for such a system. To illustrate, in the first solution described above, the independent buses may ease the bandwidth constraint to some degree, but the devices on each bus may still be vulnerable to a single controller failure or a bus failure. In the second solution, a single malfunctioning device may still potentially render all of the buses connected to it, and possibly the rest of the system, inaccessible. This same failure mechanism may also affect the third solution, since the presence of two controllers does not prevent the case where a single device failure may force the bus to some random state.
Various embodiments of a storage array using a torus interconnection topology are disclosed. In one embodiment, a storage system including a path-redundant torus interconnection fabric is coupled to a plurality of nodes. The torus interconnection fabric may be configured to connect the plurality of nodes in an array including N rows and M columns, where N and M are positive integers. The array may be configured such that a first node in a first row of the N rows is connected to a second node in the first row and a first node in a first column of the M columns is connected to a second node in the first column. Also an ending node in the first row is connected to the first node in the first row and an ending node in the first column is connected to the first node in the first column. In addition, a first portion of the plurality of nodes is configured to communicate with a plurality of storage devices such as disk drives. In other embodiments, the storage devices may be random access memories configured as cache memories or tape drives. A second portion of the plurality of nodes may be configured to communicate with a host.
In some embodiments, each node of the plurality of nodes may be configured to communicate with each other node of the plurality of nodes by routing messages bi-directionally. In an alterative embodiment, each node of the plurality of nodes is configured to communicate with each other node of the plurality of nodes by routing messages uni-directionally.
In an embodiment, a storage system including a path-redundant torus interconnection fabric is coupled to a plurality of nodes. The torus interconnection fabric is configured to logically connect the plurality of nodes in an array comprising a plurality of node rows and a plurality of node columns. The torus interconnection fabric is also configured to provide a communication path between each node in the array and at least four neighboring nodes. For each node at an end of one of the node rows or one of the node columns, the torus interconnection fabric is configured to provide a communication path to a node at the opposite end of the respective node row or node column. Each one of a first portion of the plurality of nodes comprises at least one mass storage device.
In an embodiment, a method of interconnecting a plurality of nodes in an array including N rows and M columns using a torus interconnection fabric, where N and M are positive integers, using a path-redundant torus interconnection fabric is recited. In one embodiment, a first node in a first row of the N rows is connected to a second node in the first row and a first node in a first column of the M columns is connected to a second node in the first column. Additionally, an ending node in the first row is connected to the first node in the first row and an ending node in the first column is connected to the first node in the first column. A first portion of the plurality of nodes is configured to communicate with a plurality of storage devices.
In an embodiment, a method for routing communications within a storage system comprising an array of nodes interconnected by a torus fabric is recited. In one embodiment, a communication from a source node is sent to a destination node using a first communication path. A failure in the first communication path may be detected, preventing the communication from reaching the destination node. The communication from the source node is resent to the destination node using a second communication path independent from the first communication path. The second communication path wraps either from an end of a node row of the array to the opposite end of the node row or from an end of a node column of the array to the opposite end of the node column.