1. Field of the Invention
The invention relates to resources shared by multiple processors and more particularly to resolving simultaneous requests to use a resource.
2. Discussion of Related Art
Processors have attained wide-spread use throughout many industries. A goal of any processor is to process information quickly. One technique which is used to increase the speed with which the processor processes information is to provide the processor with an architecture which includes a fast local memory called a cache. Another technique which is used to increase the speed with which the processor processes information is to provide a processor architecture with multiple processing units.
A cache is used by the processor to temporarily store instructions and data. A cache which stores both instructions and data is referred to as a unified cache; a cache which stores only instructions is an instruction cache and a cache which stores only data is a data cache. Providing a processor architecture with either a unified cache or an instruction cache and a data cache is a matter of design choice.
A factor in the performance of the processor is the probability that a processor-requested data item is already in the cache. When a processor attempts to access an item of information, it is either present in the cache or not. If present, a cache xe2x80x9chitxe2x80x9d occurs. If the item is not in the cache when requested by the processor, a cache xe2x80x9cmissxe2x80x9d occurs. It is desirable when designing a cache system to achieve a high cache hit rate, or xe2x80x9chit ratioxe2x80x9d.
After a cache miss occurs, the information requested by the processor must then be retrieved from memory and brought into the cache so that it may be accessed by the processor. A search for an item of information that is not stored in the cache after a cache miss usually results in an expensive and time-consuming effort to retrieve the item of information from the main memory of the system. To maximize the number of cache hits, data that is likely to be referenced in the near future operation of the processor is stored in the cache. Two common strategies for maximizing cache hits are storing the most recently referenced data, and storing the most commonly referenced data.
In most existing systems, a cache is subdivided into sets of cache line slots. When each set contains only one line, then each main memory line can only be stored in one specific line slot in the cache. This is called direct mapping. In contrast, each set in most modern processors contain a number of lines. Because-each set contains several lines, a main memory line mapped to a given set may be stored in any of the lines, or xe2x80x9cwaysxe2x80x9d, in the set.
When a cache miss occurs, the line of memory containing the missing item is loaded into the cache, replacing another cache line. This process is called cache replacement. In a direct mapping system, each line from main memory is restricted to be placed in a single line slot in the cache. This direct mapping approach simplifies the cache replacement process, but tends to limit the hit ratio due to the lack of flexibility with line mapping. In contrast, flexibility of line mapping, and therefore a higher hit ratio, can be achieved by increasing the level of associativity. Increased associativity means that the number of lines per set is increased so that each line in main memory can be placed in any of the line slots (xe2x80x9cwaysxe2x80x9d) within the set. During cache replacement, one of the lines in the set must be replaced. The method for deciding which line in the set is to be replaced after a cache miss is called a cache replacement policy.
Several conventional cache replacement policies for selecting a datum in the cache to overwrite include random, Least-Recently Used (LRU), pseudo-LRU, and Not-Most-Recently-Used (NMRU). Random is the simplest cache replacement policy to implement, since the line to be replaced in the set is chosen at random. The LRU method is more complex, as it requires a logic circuit to keep track of actual access of each line in the set by the processor. According to the LRU algorithm, if a line has not been accessed recently, chances are that it will not be accessed any more, and therefore it is a good candidate for replacement. Another replacement policy, NMRU, keeps track of the most recently accessed line. This most recently accessed line is not chosen for replacement, since the principle of spatial locality says that there is a high probability that, once an information item is accessed, other nearby items in the same line will be accessed in the near future. The NMRU method requires a logic circuit to keep track of the most recently accessed line within a set. In all cache replacement policies, the line selected for replacement may be referred to as a xe2x80x9ccandidate.xe2x80x9d
Once a candidate is selected, further processing must occur in the cache in order to ensure the preservation of memory coherency. If the value of the candidate has been altered in the cache since it was retrieved from memory, then the candidate is xe2x80x9cdirtyxe2x80x9d and a memory incoherency exists. Before the value of the dirty candidate can be replaced with the new information requested by the processor, the current value of the dirty candidate must be updated to memory. This operation is called a xe2x80x9cwrite backxe2x80x9d operation. While the implementation of such a scheme allows reduced bus traffic because multiple changes to a cache line need be loaded into memory only when the cache line is about to be replaced, a drawback to the write back operation is delay. That is, access to the cache is slowed or even halted during a write back operation.
A method and computer system for resolving simultaneous requests from multiple processing units to load from or store to the same shared resource. When the colliding requests come from two different processing units, the first processing unit is allowed access to the structure in a predetermined number of sequential collisions and the second device is allowed access to the structure in a following number of sequential collisions. The shared resource can be a fill buffer, where a collision involves attempts to simultaneously store in the fill buffer. The shared resource can be a shared write back buffer, where a collision involves attempts to simultaneously store in the shared write back buffer. The shared resource can be a data cache unit, where a collision involves attempts to simultaneously load from a same data space in the data cache unit. A,collision can also involve an attempt to load and store from a same resource and in such case the device that attempts to load is favored over the device that attempts to store.
In one embodiment, a shared resource receives access requests from a plurality of processing units. One such processing unit is selected to be a preferred unit that may access the shared resource. For each processing unit, a retry selector is generated. For the preferred unit, the retry selector indicates that no retry is necessary, since the preferred unit is permitted to access the shared resource. For all processing units except the preferred unit, the retry indicator contains a value indicating that a retry is necessary. The selection of a preferred processor is performed in a repeating selection pattern of P segments, where each processor is selected as the preferred processor during one of the segments. In one embodiment, this repeated selection pattern is capable of being programmably altered.
In one embodiment, the repeated selection pattern includes a segment that selects a first processing unit as the preferred processor during N sequential colliding access requests, and then selects a second processing unit as the preferred processor during each of M sequential colliding access requests that occur after the N sequential colliding requests occur. In one embodiment, M and N equal two.
The shared resource that receives the colliding access requests may be a data register in a fill buffer, a data cache unit, or a write back buffer. The same-cycle colliding requests may be an attempted load operation or an attempted store operation.
When the colliding requests include one store operation and one load operation, the processing unit requesting the load operation is selected as the preferred processor.
In one embodiment, the selection of the preferred processor is performed by an arbitration protocol unit. The arbitration protocol unit includes selection logic that repeatedly performs a selection pattern wherein, in one embodiment, M and N equal two. The selection logic may be programmably altered. The retry signals are generated by a retry signal generator.
In one embodiment, the method described above is performed in a computer system. In one embodiment, a computer system includes an arbitration circuit that arbitrates same-cycle colliding access requests. The arbitration circuit includes selection logic. In one embodiment of the selection logic, M and N equal two. In one embodiment, the computer system includes a retry signal generator, as described above.