1. Field of the Invention
The present invention relates to prioritization and arbitration of multiple elements in a system, including least recently used and first-in-first-out prioritization schemes, a reservation scheme for overriding prioritization and an arbitration scheme including split transactions and pipelined arbitration for multiple microprocessors sharing a single host bus.
2. Description of the Related Art
The personal computer industry is evolving quickly due to the increasing demand for faster and more powerful computers. Historically, computer systems have developed as single microprocessor, sequential machines which process one instruction at a time. However, performance limits are being reached in single microprocessor computer systems so that a major area of research in computer system architecture is parallel processing or multiprocessing. Multiprocessing involves a computer system which includes multiple microprocessors that work in parallel on different problems or different parts of the same problem. The incorporation of several microprocessors in a computer system introduces many design problems that are not present in single microprocessor architectures.
One difficulty in multiprocessor computer systems is that all of the microprocessors often share a single host bus and only one microprocessor can access or control the bus at any given time. Another difficulty is that many of the microprocessors may request control of the host bus at the same time. Therefore, some type of arbitration scheme is necessary to determine which microprocessor will take control of the host bus, when, and how that microprocessor takes control from the microprocessor or other device previously having control.
A complication that is encountered in multiprocessor computer systems is the maintenance of cache coherency when each microprocessor includes its own local cache memory. For simplicity, the system comprising the microprocessor and its local cache memory and cache support logic will be referred to as a central processing unit (CPU). Cache memory was developed in order to bridge the gap between fast microprocessor cycle times and slow memory access times. A cache is a small amount of very fast, relatively expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. A CPU can operate out of its cache and thereby reduce the number of wait states that must be interposed during memory accesses. When a microprocessor requests data from the memory and the data resides in the local cache, then a cache "hit" takes place, and the data from the memory access can be returned to the microprocessor from the local cache without incurring wait states. If the data is not in the cache, then a cache read "miss" takes place, and the memory request is forwarded to the system and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from main memory is provided to the microprocessor and is also written into the cache due to the statistical likelihood that this data will be requested again by the microprocessor.
The development of cache memory has facilitated the multiprocessor computer system in that each CPU requires access to the host bus less frequently, thereby making the computer system more efficient. CPUs operating out of their local cache in a multiprocessing environment have a much lower individual "bus utilization." This reduces system bus bandwidth used by each of the CPUs, making more bandwidth available for other CPUs and bus masters. However, each CPU may change the data within its own local cache, thereby requiring the need to update the main memory since other CPUs will also be accessing the main memory and would otherwise receive obsolete or dirty data. Therefore, one difficulty that has been encountered in multiprocessing architectures is the maintenance of cache coherency such that when one CPU alters the data within its local cache, this altered data will be reflected back to the main memory.
In a multiprocessor computer system using a single bus architecture, system communications take place through a shared bus, which allows each CPU to monitor other CPU bus requests by watching or snooping the bus. Each CPU has a cache system which monitors activity on the shared bus and the activity of its own microprocessor and decides which block of data to keep and which to discard in order to reduce bus traffic. A request by a CPU to modify a memory location that is stored in more than one cache requires bus communication in order for each copy of the corresponding line to be marked invalid or updated to reflect the new value.
In a write-back scheme, a cache location is updated with the new data on a CPU write hit and main memory is generally only updated when the updated data block must be exchanged with a new data block. The multiprocessor cache systems which employ a write-back scheme generally utilize some type of ownership protocol to maintain cache coherency. In this scheme, any copy of data in a cache must be identical to (or actually be) the owner of that location's data.
The arbitration scheme should include a mechanism for an "owner" cache to interrupt the current controller of the single host bus if the current controller attempts to access data from main memory that has been modified or altered by the owner cache. The arbitration scheme therefore, should include a mechanism for one of the CPUs to temporarily interrupt the current CPU controlling the host bus, so that CPU can return as the bus master when the temporary interruption is over.
A multiprocessor computer system usually includes an input/output (I/O) bus, such as the Industry Standard Architecture (ISA) bus or the Extended ISA (EISA) bus, as well as direct memory access (DMA) and random access memory (RAM) refresh. The EISA bus is not directly connected to the host bus, but includes an EISA bus controller (EBC) connected between the host bus and the EISA bus. The EBC must have access and control of the host bus occasionally to facilitate transfers of data between the CPUs and I/O devices, such as ISA or EISA bus masters which are connected to the EISA bus, as well as to return data from an I/O device or other system resource through the host bus to one of the CPUs of the computer system. Additionally, bus masters must also have access to the host bus when a bus master installed on the I/O bus directs an activity to the main memory. The DMA and RAM refresh operations also require access to the host bus. The bus masters, DMA and RAM refresh need greater priority than the CPUs in the multiprocessor system. The arbitration scheme used in a multiprocessor system must give greater priority to the DMA, RAM refresh and EISA requests to control the host bus, without disturbing the relative priorities of the CPUs.
Prioritization schemes can be implemented in multiprocessor computer systems to prioritize between several CPUs requesting control of a single host bus at the same time. Also, prioritization schemes are very useful in establishing which blocks of data within a cache, or which of the cache "ways", are to be replaced since a lower priority cache way is less likely to be used by a CPU. In general, the problem to be solved by a prioritization scheme is how to efficiently prioritize a plurality of elements. The elements reside in a system where all elements would have symmetric access to system resources, such as the host bus. Prior-art daisy-chaining and round-robin priority schemes had inherent latency and fairness problems when elements were not installed or not requesting.
Two of the most commonly implemented prioritization schemes are the first-in-first-out (FIFO) and least recently used (LRU) priority schemes. In a FIFO scheme, priority is given to that element which has requested the host bus or system resources first. A FIFO scheme used to replace cache ways may be less efficient if a certain cache way is being frequently used but is replaced since it is the oldest element. FIFO schemes are generally fair when prioritizing between several CPUs in a multiprocessor system. The least recently used (LRU) scheme gives priority to that element that had the highest priority least recently. It is based on the very reasonable assumption that the least recently used element is the one element that should have the highest priority in the future. The LRU policy avoids giving low priority to a very active element as occurs in a FIFO scheme.
FIFO prioritizers available in prior art were too large to implement efficiently. The pseudo-LRU algorithms found in the prior art are inherently unworthy if implemented in multiprocessor systems, since they violate the symmetry clause by allowing higher utilization of elements on less populated branches of the pseudo-LRU tree structure.