1. Field of Invention
This invention relates generally to microprocessors and specifically to improving cache coherency performance in a multi-processor system.
2. Description of Related Art
Caches are used in many computer systems to improve performance. A cache is a storage area that is faster to - access by a central processing unit (CPU) than main memory. Typically, the first time an address is referenced by the CPU, the corresponding data is fetched from main memory and written into the cache. Once a cache line is loaded, subsequent reads of that same address by the processor can simply access the cache, as opposed to having to access main memory.
A cache typically includes a plurality of cache lines, each of which has an associated cache tag and cache flags. The tag indicates the address in main memory corresponding to the cache line, and the flags indicate the status of the cache line. Typically, tag and status information for a cache are stored in a separate, searchable array such as, for instance, a content addressable memory (CAM) array. When cached data is updated by the CPU, its status is changed to indicate that the data is xe2x80x9cdirtyxe2x80x9d. The updated data is typically written back to main memory in a writeback operation.
In a writeback cache scheme, the updated value of the cache line is not sent to main memory until a cache replacement occurs. A cache replacement occurs when the CPU needs to access another memory location that is not in the cache, and thus must free up space in the cache to make room for the new data. A cache controller selects the cache line that is to be used for the new data. The CPU looks at the status flags associated with the cache line being replaced and determines if the cache line has been modified while in the CPU. If the cache line has been modified, the updated data must be saved back to main memory in order to maintain data coherency. Conversely, if the cache line being replaced has not been modified, no update of main memory is required, and the selected cache line is replaced by the new data.
The process of writing updated data to main memory is called a writeback, and typically uses a special buffer called a writeback buffer to temporarily store the updated data from the cache line being replaced, so that the cache line is free to accept the new data when it is fetched from main memory into the cache. During writeback, the dirty cache line selected for replacement is queued in the writeback buffer, and the selected cache line is invalidated and replaced by the new data. The writeback buffer has an associated searchable tag array such as a CAM array to store tag and status information for writeback data queued in the writeback buffer. Data in the writeback buffer is thereafter written to main memory.
If another CPU in a multi-processor system needs data at the same address, it requests the updated data from the first CPU. If the first CPU owns the requested data, i.e., the requested data has been modified by the first CPU but not yet written back to main memory, the first CPU loads the updated data into a copyback buffer, and the cache line is invalidated if necessary. Copyback data queued in the copyback buffer is thereafter provided to the second CPU over the system bus. Tag and status information associated with the copyback data is stored in a searchable tag array associated with the copyback buffer.
Although the writeback and copyback buffers advantageously free cache resources during copyback requests, updated data may be in the main cache, in the writeback buffer, or in the copyback buffer. Accordingly, when a copyback request is received, tag information associated with the request must be compared with tags in the main cache tag array, the writeback tag array, and the copyback tag array to determine whether the requested data is in the snooped CPU. The search overhead required to snoop the main cache, the writeback buffer, and the copyback buffer is expensive, and may consume a significant amount of silicon area. Accordingly, it would be desirable to reduce the amount of search overhead required for such snoop operations.
Further, in instances where the modified data requested by another CPU is in transition towards the system bus, e.g., between the main cache and the writeback buffer during a writeback operation, it may be difficult to search for and capture the requested data. For example, even if the requested data is located, the requested data may again transition closer to the system bus, e.g., read out of the writeback buffer, before the snoop results are acted upon. To alleviate this difficulty, the writeback operation is typically stalled during snoop operations so that the requested data remains stationary. Unfortunately, stalling the writeback operation degrades performance. Accordingly, it would also be desirable to be able to easily locate and capture modified data without stalling the writeback operation.
A method and apparatus are disclosed that reduce search overhead for snoop operations during, for example, copyback operations. In accordance with the present invention, the main cache of a processor in a multiprocessor computing system is coupled to receive writeback data during writeback operations. In one embodiment, during writeback operations, i.e., in response to a cache miss, dirty data in the main cache is merged with modified data from an associated write cache, and the resultant writeback data line is loaded into a writeback buffer. The writeback data is also written back into the main cache. In some embodiments, further modifications of the writeback data in the main cache are prevented. The writeback data line in the main cache remains valid until read data for the cache miss is returned, thereby ensuring that the read address reaches the system interface for proper bus ordering before the writeback data is replaced. The writeback operation may be paired with the read operation for the cache miss to ensure that upon completion of the read operation, the writeback address has reached the system interface for bus ordering, thereby maintaining cache coherency while allowing requests to be serviced from the main cache.
By maintaining a copy of the writeback data in the main cache during writeback operations, subsequent requests for the data need only be snooped for in the main cache, thereby eliminating the need for search overhead for the writeback buffer. Accordingly, since present embodiments snoop only tag information for the main cache during data requests, the size of the tag array required for snooping is smaller, and less expensive, than prior searchable tag arrays that store tag information for the main cache and the writeback buffer. In addition, since snoop operations are serviced from the main cache, it is not necessary to stall the writeback operation for snoop operations, which in turn increases performance over prior art systems that stall the writeback operation for such snoop operations.