1. Field of the Invention
The field of the invention relates to resources shared by multiple processing units and more particularly to a shared write back buffer within a data cache unit that is shared by multiple processing units.
2. Discussion of Related Art
Processors have attained widespread use throughout many industries. A goal of any processor is to process information quickly. One technique which is used to increase the speed with which the processor processes information is to provide the processor with an architecture which includes a fast local memory called a cache. Another technique, which is used to increase the speed with which the processor processes information, is to provide a processor architecture with multiple processing units.
A cache is used by the processor to temporarily store instructions and data. A cache which stores both instructions and data is referred to as a unified cache; a cache which stores only instructions is an instruction cache and a cache which stores only data is a data cache. Providing a processor architecture with either a unified cache or an instruction cache and a data cache is a matter of design choice.
A factor in the performance of the processor is the probability that a processor-requested data item is already in the cache. When a processor attempts to access an item of information, it is either present in the cache or not. If present, a cache xe2x80x9chitxe2x80x9d occurs. If the item is not in the cache when requested by the processor, a cache xe2x80x9cmissxe2x80x9d occurs. It is desirable when designing a cache system to achieve a high cache hit rate, or xe2x80x9chit ratioxe2x80x9d.
After a cache miss occurs, the information requested by the processor must then be retrieved from memory and brought into the cache so that it may be accessed by the processor. A search for an item of information that is not stored in the cache after a cache miss usually results in an expensive and time-consuming effort to retrieve the item of information from the main memory of the system. To maximize the number of cache hits, data that is likely to be referenced in the near future operation of the processor is stored in the cache. Two common strategies for maximizing cache hits are storing the most recently referenced data and storing the most commonly referenced data.
In most existing systems, a cache is subdivided into sets of cache line slots. When each set contains only one line, then each main memory line can only be stored in one specific line slot in the cache. This is called direct mapping. In contrast, each set in most modem processors contains a number of lines. Because each set contains several lines, a main memory line mapped to a given set may be stored in any of the lines, or xe2x80x9cwaysxe2x80x9d, in the set.
When a cache miss occurs, the line of memory containing the missing item is loaded into the cache, replacing another cache line. This process is called cache replacement. In a direct mapping system, each line from main memory is restricted to be placed in a single line slot in the cache. This direct mapping approach simplifies the cache replacement process, but tends to limit the hit ratio due to the lack of flexibility with line mapping. In contrast, flexibility of line mapping, and therefore a higher hit ratio, can be achieved by increasing the level of associativity. Increased associativity means that the number of lines per set is increased so that each line in main memory can be placed in any of the line slots (xe2x80x9cwaysxe2x80x9d) within the set. During cache replacement, one of the lines in the set must be replaced. The method for deciding which line in the set is to be replaced after a cache miss is called a cache replacement policy.
Several conventional cache replacement policies for selecting a datum in the cache to overwrite include Random, Least-Recently Used (LRU), Pseudo-LRU, and Not-Most-Recently-Used (NMRU). Random is the simplest cache replacement policy to implement, since the line to be replaced in the set is chosen at random. The LRU method is more complex, as it requires a logic circuit to keep track of actual access of each line in the set by the processor. According to the LRU algorithm, if a line has not been accessed recently, chances are that it will not be accessed any more, and therefore it is a good candidate for replacement. Another replacement policy, NMRU, keeps track of the most recently accessed line. This most recently accessed line is not chosen for replacement, since the principle of spatial locality says that there is a high probability that, once an information item is accessed, other nearby items in the same line will be accessed in the near future. The NMRU method requires a logic circuit to keep track of the most recently accessed line within a set. In all cache replacement policies, the line selected for replacement may be referred to as a xe2x80x9ccandidate.xe2x80x9d
Once a candidate is selected, further processing must occur in the cache in order to ensure the preservation of memory coherency. If the contents of the candidate have been altered in the cache since it was retrieved from memory, then the candidate is xe2x80x9cdirtyxe2x80x9d and a memory incoherency exists. Before the contents of the dirty candidate can be replaced with the new information requested by the processor, the current contents of the dirty candidate must be updated to memory. This operation is called a xe2x80x9cwrite backxe2x80x9d operation. While the implementation of such a scheme allows reduced bus traffic because multiple changes to a cache line need be loaded into memory only when the cache line is about to be replaced, a drawback to the write back operation is delay. That is, access to the cache is slowed or even halted during a write back operation.
A shared write back buffer for storing data from a data cache to be written back to memory. The shared write back buffer includes a plurality of ports, each port being associated with one of a plurality of processing units. In one embodiment, the ports receive address data originating from one of the plurality of processing units. All processing units in the plurality share the write back buffer. The shared write back buffer further includes data bank that includes a plurality of data registers. The data register stores data provided through the input ports. The write back buffer also includes an address bank that includes a plurality of address registers. An address register stores addresses associated with the data provided through the input ports. In one embodiment, the address bank further includes a plurality of full indicators. The write back buffer includes a single output port for providing the data to the associated addresses in memory.
In one embodiment, the write back buffer further comprises a data selector circuit and an address selector circuit. The data selector circuit selects, for each storage data, one of the data registers to receive the storage data. The address selector circuit selects for each address data, one of the address registers to receive the address inputs.
In one embodiment, the shared write back buffer is included in a computer system, where the computer system also includes a plurality of processing units and a main memory. In one embodiment, the computer system also includes a data selector circuit and an address selector circuit as described above. In one embodiment, the computer system also includes a data cache unit, where the data cache unit includes a write back buffer, directory array, and data array.
The shared write back buffer stores storage data that is to be written to main memory, where the shared write back buffer receives the storage data from any of the plurality to processing units. That is, the processing units share the write back buffer, so that the write back buffer may receive storage data from any and all of the processing units. In one embodiment, the computer system includes two processing units.
In one embodiment, the data bank includes two data registers. In one embodiment, the address bank includes two address registers.
In one embodiment, the address data from a processing unit is forwarded to the storage data inputs from the data array.
In one embodiment, the address data from a processing unit is forwarded to the address inputs from the directory array.
The present invention will be more fully understood in light of the following detailed description taken together with the accompanying drawings.