1. Field of the Invention
The present invention generally relates to cache updating, and more particularly to cache updating in a shared-memory multiprocessor system.
2. Description of the Related Art
Users of data processing systems continue to demand greater performance for handling increasingly complex and difficult tasks. Greater performance from the processors that operate such systems may be obtained through faster clock speeds so the individual instructions are processed more quickly. However, processing speed has increased much more quickly than the speed of main memory. Despite the speed of a processor, a bottleneck on computer performance is that of transferring information between the processor and memory. Therefore, cache memories, or caches, are often used in many data processing systems to increase performance in a relatively cost-effective manner.
A typical cache comprises a cache data RAM (Random Access Memory), a cache directory RAM, bus buffers, and a cache controller. The cache data RAM is a small, fast memory which is used to store copies of data which could be accessed more slowly from main memory. The cache size is the number of bytes in the cache data RAM alone. The cache directory RAM contains a list of main memory addresses of data stored in corresponding locations of the cache data RAM. Accordingly, for each cache location, an address and data is stored, making the combined cache directory and cache data RAMs behave like a single, wide memory. The bus buffers are controlled in such a way that if the cache can supply a copy of a main memory location (this is called a cache hit), then the main memory is not allowed to send its data to the requesting CPU. If the cache does not contain a copy of the data requested by the CPU (this is called a cache miss), the bus buffers allow the address issued by the CPU to be sent to the main memory. The cache controller implements the algorithm which moves data into and out of the cache data RAM and the cache directory RAM.
The benefits of a cache are realized whenever the number cache hits are maximized relative to the number cache misses. Despite the added overhead that occurs as a result of a cache miss, as long as the percentage of cache hits is high (known as the xe2x80x9chit ratexe2x80x9d), the overall processing speed of the system is increased. One method of increasing the hit rate for a cache is to increase the size of the cache. However, cache memory is relatively expensive and is limited by design constraints, particularly if the cache is integrated with a processor on the same physical integrated circuit.
Another method is to chain together multiple caches of varying speeds. A smaller but faster primary cache is chained to a relatively larger but slower secondary cache. Furthermore, instructions and data may be separated into separate data and instruction caches. Illustratively, some processors implement a small internal level one (L1) cache with an additional external level two (L2) cache, and so on.
Shared-memory multiprocessor systems present special issues regarding cache implementation and management. In a shared-memory multiprocessor system, all processors can access the main memory. This enables the tasks on all of the processors to efficiently and easily share data with one another. However, this sharing must be controlled to have predictable results. Conventionally, shared-memory multiprocessor systems have hardware that maintains cache coherency and provide software instructions that can be used to control which processor is storing to a particular memory location.
From the very creation of multiprocessor systems, the sharing of data in main memory has limited the scalability of both hardware and software. That is, it has limited the number of processors that could be effectively used in a multiprocessor system. As the number of processors in a multiprocessor system increases, the problem of limited scalability becomes worse. As a result, efficient hardware and software are needed.
Concerning hardware, most shared-memory multiprocessor systems use a snoop-invalidate cache protocol that allows a processor to store data to a memory location only if it has a modified copy of the cache line associated with the memory location. Other copies in other caches with a matching address are invalidated. This prevents multiple processors from storing to the line at once and keeps the system coherent.
In a shared-memory multiprocessor system, most writes to main memory by a processor modify only the processor""s cache. The main memory will be updated with new data only when the modified cache line is evicted from the cache. Moreover, processors usually read data from main memory, operate on the read data, and write the result back to main memory. It is unlikely that a processor writes data to a main memory address and then reads back the data from the same main memory address. Therefore, in a large system with a large number of processors, the next processor to read and/or write to a memory location is often not the processor whose cache has the cache line associated with the memory location. This requires the cache line to be moved between the caches of different processors. Efficiently moving cache lines to other caches (i.e., cache update) is critical to multiprocessor systems.
On a shared-memory multiple processor system with 16 megabytes of level two (L2) cache per processor, about forty percent of the cache misses are due to reading and/or writing of shared data. Making the cache larger or adding additional levels of cache does not reduce the amount of cache misses. Instead, the result is the percentage of cache misses becomes larger with a larger cache and movement of the cache lines between caches reduces the performance of multiple processor systems.
Accordingly, there is a need for an apparatus and method in which cache updates are effectively carried out for a shared-memory multiprocessor system.
In one embodiment, a method is used for updating caches in a multiprocessor system having at least first and second processors coupled to a system bus, the first processor having a first cache and the second processor having a second cache. The method comprises, if a cache write hit occurs to a cache line in the first cache of the first processor and the cache line came from the second cache of the second processor, modifying a content of the cache line and broadcasting the modified content of the cache line on the system bus at a predetermined time after the content of the cache line is modified by the first processor.
In another embodiment, a computer system comprises a system bus and at least first and second processors coupled to the system bus, the first processor having a first cache and the second processor having a second cache. If a cache write hit occurs to a cache line in the first cache of the first processor and the cache line came from the second cache of the second processor, the first processor is configured to modify a content of the cache line and broadcast the modified content of the cache line on the system bus at a predetermined time after the content of the cache line is modified by the first processor.