Information handling systems which incorporate multiple processors in the system design have the potential to increase system information processing capacity and to increase overall system performance. However, a substantial portion of this increased processing capacity potential is not realized using current multi-processor information handling system design architectures which incorporate a single memory subsystem which is accessible by each of the processors via a single multi-processor bus.
As is the case with single processor information handling systems, each processor in a multi-processor system operates on information (instructions and data) which resides in memory. Each processor can request information from the memory subsystem and such requests are conducted via the multi-processor bus. High speed processors can process more information per unit time than slower processors. In addition, newer processors require more information per request than older generation processors. In designing multi-processor information handling systems, it is desirable to avoid having to access the multi-processor bus and the memory subsystem. As more information is requested from the memory subsystem, the desired increases in system level performance diminishes as the processors place increased demands on the multi-processor bus and the memory subsystem.
The system level performance of a multi-processor information handling system is dependent upon its "system bandwidth", which is a function of the multi-processor bus bandwidth and the memory subsystem bandwidth and is a measure of the information transfer capability of the system expressed in megabytes per second (MB/sec). The system bandwidth is the amount of information that can be transferred per unit time between the memory subsystem and the requesting processors over the multi-processor bus. The system performance of multi-processor systems utilizing high-speed processors is constrained by the system bandwidth. The ability of multi-processor systems to effectively utilize the increased processing capacity of additional processors diminishes as the multi-processor bus utilization approaches 100% (i.e., as the multi-processor bus becomes saturated), as the memory subsystem utilization approaches 100% (i.e., as the memory subsystem becomes saturated), or as both the multi-processor bus and the memory subsystem become saturated.
Before describing the details of the present invention, a description of a generic prior art multi-processor information handling system may be helpful in understanding the advantages of the present invention. A typical multi-processor information handling system design incorporates a single multi-processor bus which allows two or more processors to access a single memory subsystem. In this type of architecture, the multi-processor bus is shared among the processors such that only one processor has control of the multi-processor bus at any given time. To perform a memory operation, the processor must control both the multi-processor bus and the memory subsystem.
The duration of a memory operation is defined as the time from the acquisition of the multi-processor bus until the time the multi-processor bus is released after the memory operation has been completed. During a typical memory access, a processor generates a request to gain access to the multi-processor bus to perform a memory operation with a memory subsystem. The transaction request is either a READ memory access wherein the processor is requesting to read information from the memory subsystem or a WRITE memory access wherein the processor is requesting to write information to the memory subsystem. This memory operation is referred to as a "connected transaction operation". During a connected transaction operation, information is transferred between the requesting processor and the memory subsystem over the multi-processor bus. In operation, a requesting processor makes a request and arbitrates for access to the multi-processor bus. Upon being granted access to the multi-processor bus, the requesting processor controls both the multi-processor bus and the memory subsystem for the duration of the memory operation.
The requesting processor maintains exclusive control over both the multi-processor bus and the memory subsystem for performing a memory operation. For example, a processor generates a request and that request in turn arbitrates to use the multi-processor bus. When the multi-processor bus becomes available, the request is granted access to the multi-processor bus and maintains exclusive control over the multiple-processor bus and the memory subsystem for the duration of a memory operation defined in the request (e.g., a READ memory access). Thus, the request maintains exclusive control over the multi-processor bus and the memory subsystem during the entire memory operation and a later request will not be accommodated until the prior request has been completed. For a READ memory access, a portion of the memory operation is associated with the wait period during which the multi-processor bus is idle awaiting the memory subsystem to respond to the requesting processor with information. For a WRITE memory access, a portion of the memory operation is associated with the wait period during which the multi-processor bus is idle awaiting the memory subsystem to respond to the requesting processor with an acknowledgement.
The system bandwidth of a multi-processor information handling system is determined by the time required to deliver an information block (generally a cache line or a cache line multiple). The time required to deliver an information block is measured from the moment a transaction request acquires the multi-processor bus until either the moment the requested information is received by the requesting processor for a READ memory access or the moment an acknowledgement is received by the requesting processor for a WRITE memory access. The system bandwidth is limited by, and a function of, the multi-processor bus clock rate, the memory subsystem clock rate, and the memory bandwidth. The multi-processor bus clock rate is divided by the time required for one cycle of the multi-processor bus, and is expressed in megahertz (MHz). The memory subsystem clock rate is the time required for one cycle of the memory subsystem, and is expressed in megahertz. The memory bandwidth is the memory subsystem clock rate multiplied by the number of information bytes in a memory access divided by the number of clock cycles required to perform a single memory access. For purposes of calculating the memory bandwidth, it is observed that the memory subsystem clock rate can be different from the multi-processor bus clock rate.
For a READ memory access, the system bandwidth is the sum of the time required for the processor to use the multi-processor bus to request memory access, plus the time required for the memory subsystem to perform the memory operation, plus the time required for the information to be returned to the processor via the multi-processor bus. For a WRITE memory access, the system bandwidth is the sum of the time required for the processor to use the multi-processor bus to request memory access, plus the time required for the memory subsystem to generate the acknowledgement, plus the time required for the acknowledgement to be returned to the processor via the multi-processor bus.
A multi-processor information handling system using this prior art connected transaction bus memory operation can achieve only a small increase in system bandwidth by doubling the multi-processor bus transfer width. For example, for a READ memory operation a typical move time for a 32 byte line of information on a 64 bit multi-processor bus is 5 multi-processor bus cycles (1 cycle for the address strobe plus 4 cycles for the information transfer), and a typical move time for a 32 byte line of information on a 128 bit multi-processor bus is 3 multi-processor bus cycles (1 cycle for the address strobe plus 2 cycles for the information transfer). An aggressive memory subsystem design is expected to require 8 memory cycles to perform the memory access. Therefore, a 64 bit multi-processor bus is expected to require 13 multi-processor bus cycles for a READ memory operation, and a 128 bit multi-processor bus is expected to require 11 multi-processor bus cycles for a READ memory operation. Under these operating conditions, a READ connected transaction bus memory operation is expected to result in a system bandwidth of 123 MB/sec for a 64 bit multi-processor bus operating at 50 MHz and 145 MB/sec for a 128 bit multi-processor bus operating at 50 MHz. Thus, by doubling the multi-processor bus transfer width from 64 bits to 128 bits, only an 18% increase in system bandwidth is expected. By way of further example, for a WRITE memory operation a typical move time for a 32 byte line of information on a 64 bit multi-processor bus is 6 multi-processor bus cycles (1 cycle for the address strobe plus 4 cycles for the information transfer plus 1 cycle for the acknowledgement signal), and a typical move time for a 32 byte line of information on a 128 bit multi-processor bus is 4 multi-processor bus cycles (1 cycle for the address strobe plus 2 cycles for the information transfer plus I cycle for the acknowledgment signal). An aggressive memory subsystem design is expected to require 8 memory cycles to perform the memory access. Therefore, a 64 bit multi-processor bus is expected to require 14 multi-processor bus cycles for a WRITE memory operation, and a 128 bit multi-processor bus is expected to require 12 multi-processor bus cycles for a WRITE memory operation. Under these operating conditions, a WRITE connected transaction bus memory operation is expected to result in a system bandwidth of 114 MB/sec for a 64 bit multi-processor bus operating at 50 MHz and 133 MB/sec for a 128 bit multi-processor bus operating at 50 MHz. Thus, by doubling the multi-processor bus transfer width from 64 bits to 128 bits, only a 17% increase in system bandwidth is expected. When doing POSTED WRITE operations the calculations and the results are different.
The incorporation of additional memory into the single memory subsystem of the prior art multi-processor design architecture only increases the memory subsystem capacity and does not increase the system bandwidth. The mere addition of memory to the memory subsystem does not change any of the parameters that determine system bandwidth and does not create additional paths for the flow of information between the processors and the single memory subsystem. Thus, while adding memory to the memory subsystem does increase the memory subsystem capacity, it does not improve overall multi-processor system bandwidth since the system can still only process one memory request at a time.
As described above, doubling the transfer width of the multi-processor bus generally yields only a small increase in the system bandwidth, and adding memory to the memory subsystem only increases the memory subsystem capacity and does not increase the system bandwidth. Therefore, the effective scalability of multi-processor information handling systems is limited using the prior art system design architecture.