This invention relates generally to multi-processor computer systems in which transactions may be completely received over two or more clock cycles, and more particularly to allocating resources for such transactions.
There are many different types of multi-processor computer systems. A symmetric multi-processor (SMP) system includes a number of processors that share a common memory managed by a memory transaction manager. SMP systems provide scalability. As needs dictate, additional processors can be added. SMP systems usually range from two to thirty-two or more processors. One processor generally boots the system and loads the SMP operating system, which brings the other processors online. Without partitioning, there is only one instance of the operating system and one instance of the application in memory. The operating system uses the processors as a pool of processing resources, all executing simultaneously, where each processor either processes data or is in an idle loop waiting to perform a task. SMP systems increase in speed whenever processes can be overlapped.
A massively parallel processor (MPP) system can use thousands or more processors. MPP systems use a different programming paradigm than the more common SMP systems. In an MPP system, each processor contains its own memory and copy of the operating system and application. Each subsystem communicates with the others through a high-speed interconnect. To use an MPP system effectively, an information-processing problem should be breakable into pieces that can be solved simultaneously. For example, in scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time.
A non-uniform memory access (NUMA) system is a multi-processing system in which memory is separated into distinct banks. NUMA systems are similar to SMP systems. In SMP systems, however, all processors access a common memory at the same speed. By comparison, in a NUMA system, memory on the same processor board, or in the same building block, as the processor is accessed faster than memory on other processor boards, or in other building blocks. That is, local memory is accessed faster than distant shared memory. NUMA systems generally scale better to higher numbers of processors than SMP systems. The term building block is used herein in a general manner, and encompasses a separable grouping of processor(s), other hardware, such as memory, and software that can communicate with other building blocks.
One particular type of NUMA system is the NUMA-quad (NUMA-Q) system. A NUMA-Q system is a NUMA system in which the fundamental building block is the quad, or the quad building block (QBB). Each quad can contain up to four processors, a set of memory arrays, a memory transaction manager, and an input/output (I/O) processor (IOP) that, through two host bus adapters (HBAs), accommodates two to eight I/O buses. An internal switch in each QBB allows all processors equal access to both local memory and the I/O buses connected to the local I/O processor. An application running on a processor in one QBB can thus access the local memory of its own QBB, as well as the shared memory of the other QBBs. More generally, a quad refers to a building block having at least a collection of up to four processors and an amount of memory.
A difficulty with multi-processor systems, as well as with single-processor systems, is that transactions may be multiplexed over the physical interfaces of processors, such that they are not completely received by the transaction managers in a single clock cycle. Manufacturers and designers of processors typically attempt to minimize the number of pins on their integrated circuits (ICs), typically because of cost constraints, which can necessitate the multiplexing of information sent to the transaction managers. In the case of many types of transactions, this means that the transactions cannot be completely sent by the processors in a single clock cycle, but rather are sent over two clock cycles. A transaction can be generally and, non-restrictively defined as a request from a transaction generator, such as another processor, an application-specific IC (ASIC), and so on. The request may ask that the transaction manager perform a command on a resource, such as a read command, a write command, and so on.
Because a transaction may not be completely received in a single clock cycle, the transaction manager may not be able to determine with precise specificity the resources to which the transaction relates, and thus the resources that the transaction manager should allocate. The resources may include queues, buffers, memories, and so on. The transaction manager may thus have to wait an extra clock cycle until it completely receives the transaction before the transaction manager can determine the resources the transaction needs, and thus the resources the transaction manager should allocate for the transaction. This can unnecessarily slow the system down, and furthermore may result in a reduction of transaction bandwidth. Alternatively, the transaction manager may have extra resources allocated to it on more or less a permanent basis just in case a given received transaction needs them, but this can lead to a lack of or underutilization of resources.
For these described reasons, as well as other reasons, there is a need for the present invention.