Multiprocessor computer systems have long been valued for the high performance they offer by utilizing multiple processors that are not individually capable of the same high level of performance as the multiprocessor system. In such multiprocessor systems, tasks are divided among more than one processor, such that each processor does a part of the computation of the system. Therefore, more than one task can be carried out at a time with each task or thread running on a separate processor, or a single task can be broken up into pieces that can be assigned to each processor. Multiprocessor systems incorporate many methods of dividing tasks among their processors, but all benefit from the ability to do computations on more than one processor simultaneously. Traditionally, multiprocessor systems were large mainframe or supercomputers with several processors mounted in the same physical unit.
With multiple processors and multiple computational processes within a multiprocessor system, a mechanism is needed for allowing processors to share access to data and share the results of their computations. Centralized memory systems use a single central bank of memory that all processors can access, such that all processors can access the central memory at roughly the same speed. Still other systems have distributed memory for individual processors or groups of processors and provide faster access to memory that is local to each processor or group of processors, but access to data from other processors takes somewhat longer than in centralized memory systems.
Shared address memory systems allow multiple processors to access the same memory, whether distributed or centralized, to communicate with other processors via data stored in the shared memory. Cache memory can be utilized to attempt to provide faster access to data each processor is likely to need and to reduce requests for the same commonly used data from multiple processors on the system bus.
Cache in a shared address system typically caches memory from any of the shared memory locations, whether local or remote from the processor requesting the data. The cache associated with each processor or group of processors in a distributed shared memory system likely maintains copies of data from memory local to a number of other processor nodes. Information about each block of memory is kept in a directory, which keeps track of data such as which caches have copies of the block, whether the cache is dirty, and other related data. The directory is used to maintain cache coherency, or to ensure that the system can determine whether the data in each cache is valid. The directory is also used to keep track of which caches hold data that is to be written, and facilitates granting exclusive write access to one processor or I/O device. After write access has been granted and a memory location is updated, the cached copies are marked as dirty.
As such multi-processor systems continue to grow in size, the number of requests by the different processors to various data in different memories also increases. Accordingly, such systems are vulnerable to becoming congested. Further, such systems are prone to request starvation, wherein the length of time for servicing the requests for data is such that the performance of applications executing on the system is adversely affected. Moreover, such large multi-processor systems may become difficult to manage with regard to fairness among servicing of the requests.
In a typical system, a request priority scheme is employed, wherein the requests for data are assigned priorities. Such a system also includes a mechanism to trap excessively NACKed requests (i.e., those requests that receive negative acknowledgements (NACKs), indicating that the requests cannot yet be serviced). Therefore, in response to a NACK, the processor requesting data from memory may transmit another request for the data. In this system, the priority of the request is a step function based on the number of NACKs received in response to the request. In the absence of such a priority scheme, the chance of requests being serviced depends on the frequency of its visits to the servicing node (servicing the requests for the data). Therefore, requests from processors having a relatively closer proximity to the servicing node are favored in comparison to those requests from processors that are more distant. Furthermore, the NACK-based priority scheme amplifies the natural priority of requests, thus making the servicing of more distant requests more difficult. Such an imbalance might be further aggravated by a system that is congested.
This typical system only guarantees operation that is free from requests that are starved (not processed for an excessive amount of time) if there is no more than one request that is of highest priority for given data in a memory at a time. However, this restriction is routinely violated. Thus, in addition to remote requests that are “handicapped” (in comparison to requests that are more local to the servicing node), such systems produce excessively NACKed requests. These NACKed requests are typically trapped and handed over to software, which injects them back into the system. Accordingly, such a system degrades the fairness among requests; does not provide for starvation free operation; and is costly in terms of time and resources.