1. Field of the Invention
Embodiments of the present invention relate, in general, to allocation of data buffer memory and more particularly to a non-blocking switching fabric with efficient allocation of data buffer memory from remote memory independent of the location of control processing.
2. Relevant Background
Typical current computer system configurations consist of one or more microprocessor and Input/Output (“I/O”) complexes connected, through internal high speed busses. This connection occurs via I/O adapter cards, commonly termed Host Bus Adapters (HBAs) or Network Interface Cards (NICs). Examples of I/O busses that can be used to connect the microprocessor complex to HBAs and NICs are InfiniBand and Peripheral Component Interconnect (“PCIe”) switches, as shown in prior art FIG. 1.
Microprocessor complexes in many such systems provide several I/O busses (devices) that may either be connected directly to an HBA or connected to several HBAs through an I/O switch. As illustrated in FIG. 1, an I/O switch 110 forms a tree of devices owned by one microprocessor complex 120 at the root of the tree. The microprocessor complex 120 of FIG. 1 is connected directly to a single HBA 130 as well as the switch 110 which is in turn coupled to two other HBA devices 140, 150 and a NIC 160, the NIC 160 providing access to a network 170. Two arrays 180, 190 are connected to the microprocessor complex 120 via the PCIe switch 110/HBA 150 path or directly through a single HBA 130. Currently, no mechanism exists with standard PCI devices and switches to share an I/O device with multiple microprocessor complexes.
As many computer systems, and the storage devices connected to them, are expected to maintain a very high state of availability, it is typically expected that the systems continue to run even when a component fails. The current approach to achieving high availability is to provide redundant components and paths. For example, a system may have two microprocessor complexes, each of which can access all of the I/O devices. Should one of the microprocessors fail, the other can continue processing, allowing the applications to continue running albeit at a decreased level of performance.
Within a storage appliance, it remains necessary to have at least two independent storage processors in order to achieve high availability. In FIG. 2, two such independent storage processors 230, 240 are shown, as is known in the prior art, with two separate instances of an operating system, one running on each processor 230, 240. There is also a pair of inter-processor links 280 that provide communication between the two processors 230, 240 and can optionally include switches 290 and additional links to other storage processors for capacity expansion. These links can be Ethernet, InfiniBand or of other proprietary designs. The system shown in FIG. 2 allows each host 210, 220 equal access to each array 250, 260, 270 via one or both of the storage processors 230, 240, as depicted by the lines L1 through L9.
For a number of reasons many of the offered I/O requests and associated data may have to be processed by the two or more storage processors 230, 240 necessitating travel of data across the inter-processor links 280. This can occur with larger configurations because a given host 210, 220 and array 250, 260, 270 may not be connected to the same set of storage processors 230, 240. Even when they are, the direct link may be a secondary one, and hence, the requests will still have to travel across an inter-processor link 280. Additionally, some applications frequently modify and reference data states that can be difficult and expensive to distribute between storage processors. In such cases, only a standby copy exists on other storage processors, and all requests that need that application must be forwarded over the inter-processor links to the active application instance. Requests that must visit two or more storage processors encounter additional forwarding delays, require buffer allocation in each storage processor, and can use substantial inter-processor link bandwidth.
Storage Networking protocols, such as Fibre-Channel and others as known in the prior art, allow a number of hosts to share a number of storage devices, thus increasing configuration flexibility and potentially lowering system cost. However, in such systems intelligent switches are needed to allocate the shared storage between the hosts and allow efficient transfer of data between any storage device 370 and any host. In FIG. 3, such a switch is shown that includes an I/O switching fabric to provide full connectivity, microprocessor complexes to perform data service operations, and Host-Bus Adapters 350 to interface to the Fibre-Channel network. Such an I/O switching fabric 340 allows data to be sent between any of the HBAs and any microprocessor complexes 310, 320, and could be of any appropriate type, such as InfiniBand, PCI express, or Ethernet. One exemplary switching fabric is described in co-pending U.S. patent application Ser. No. 11/466,734 entitled “Cross-Coupled Peripheral Component Interconnect Express Switch,” the entirety of which is incorporated herein by this reference. The combination of HBAs, I/O switching Fabric, and microprocessor complexes forms an intelligent data switching system and can provide data routing and transformation services to the connected hosts.
In constructing such an I/O switch, it is desirable to minimize the costs while maximizing achieved bandwidth. Cost can be minimized by maintaining a constant cross-sectional bandwidth in the switch interconnection network as is illustrated in the example shown in FIG. 3. In FIG. 3, eight HBAs 350 each connect to the switch through a link with one unit of bandwidth (1B), while each of the two microprocessor complexes 310, 320 connects to the switch through a link with four units of bandwidth (4B). The cross-sectional bandwidth between the HBAs 350 and the switching fabric 340, and between the switching fabric 340 and the microprocessor complexes 310, 320 sums to 8B meeting the constant cross-sectional bandwidth criterion.
To maximize performance of switching fabric 340 such as illustrated in FIG. 3, it is necessary to distribute data traffic and data services operations evenly between the two (or more) microprocessor complexes 310, 320. For data traffic, full switch bandwidth is only achieved if all paths are equally utilized. Consider the following examples. For instance, should all the data traffic happen to move through microprocessor complex one 310, the switch to processor link would be oversubscribed and the delivered bandwidth to microprocessor complex one 310 could be cut in half regardless of the capability of microprocessor one 310. Similarly, if data service requests preferentially flow to microprocessor complex one 310, data service processing may be limited by the available computing power in the complex. In both cases distributing the data flow and data processing services evenly between the two microprocessor complexes 310, 320 allows the highest possible performance of the system.
A non-blocking of buffer allocation between processor-memory modules is lacking yet is necessary to achieve balanced data flow and to assist in the balancing of data processing. Mechanisms to evenly distribute data movement between a plurality of microprocessor complexes, to allow assignment of data services enabling processing of data independent of the buffering and transfer of the associated data, and to efficiently allocate buffering for the data, can achieve that balance.