The present invention is related to memory systems attached to computer central processing units, and in particular to memory systems attached to central processing units of microprocessors in a shared memory configuration.
Conventional microprocessors access random access memory through address and data buses and control signals. Some microprocessors use a common address/data bus which is time-multiplexed.
When the microprocessor CPU (central processing unit) reads data (which may include instructions) stored in the memory by performing a read operation, the microprocessor typically places an address on the microprocessor address bus (or common address/data bus) and requests a "read" operation via the control signals. Similarly, when the microprocessor writes data to the memory it typically first places an address on its address bus, and requests a "write" operation via its control signals. During subsequent steps of the write operation, the CPU places the data to be written on its data bus (or on the address/data bus in the case of a time-multiplexed address/data bus).
A cache is a small, fast memory logically located between the random access memory and the microprocessor CPU. A cache accelerates reads to the memory by holding the most recently accessed data.
The cache memory is not a random access memory, but rather an associative memory. When presented with an address and data as a result of a microprocessor write operation, the cache associates the address with the data and stores the data in its memory. When presented with an address as the result of a microprocessor read operation, the cache inspects the address to determine whether or not the cache has stored data associated with the address. If such an association exists, the cache "hits" and the data is presented to the microprocessor with no interaction on the part of the random access memory. Alternatively, if no such association exists, the cache "misses" and the random access memory must be read to fill the cache and to deliver the requested data to the microprocessor.
In the case of a cache miss, caches cause the microprocessor to stall the existing program flow and to perform a cache fill procedure to bring the requested data into the cache. This degrades the overall performance of the program.
For high performance applications, it is desirable to have as much data encached as possible. However, a problem exists when multiple microprocessors and other devices are allowed to read and write to the random access memory which is a shared memory (SM). It is possible that two or more devices use information stored in the same location in the shared memory. In such a case, it is important that all devices use this information consistently.
For example, it is possible that one microprocessor can encache a portion of the shared memory in its cache, and subsequently a second microprocessor or other device can overwrite the same location in the shared memory. The first microprocessor must be made aware that its encached copy of the shared memory data is no longer valid, since the data has been modified by another device. This is called the "cache consistency problem."
The shared memory is often used by two or more microprocessors or other processing engines to communicate with each other. An example of such a system is described in U.S. patent application Ser. No. 08/093,397, "Communication Apparatus and Methods," now U.S. Pat. No. 5,515,376, issued on May 7, 1996. In this system, multiple microprocessors and network controllers communicate through a shared memory for the purpose of forwarding packets of information between networks. A network controller writes the packet into a buffer in the shared memory, and writes control information associated with the packet into a descriptor in the shared memory. A microprocessor reads this information in order to process the packet. The network controller writes the information associated with a particular packet only once; therefore, once the writing has been completed, the microprocessor may read and encache this information. However, the network controller may use the same region of the shared memory later to store information for a new packet. At this point, the information stored in the microprocessor's cache is inconsistent with what has been written into the shared memory. The microprocessor must somehow be made to ignore what is stored in its cache and instead to read the new information from the shared memory.
One solution to the cache consistency problem is simply not to encache shared information in the first place. For example, the MIPS R3000 family microprocessor architecture[ref. MIPS RISC Architecture, by Gerry Kane, Prentice-Hall, 1988, hereby incorporated herein by reference] specifies certain portions of memory to be cacheable, and other portions to be uncacheable, as indicated by certain high-order bits in the microprocessor's internal, virtual address. In systems employing this microprocessor, shared information may be accessed via non-cacheable virtual addresses. However, this solution reduces performance for two reasons, discussed below.
First, a particular piece of shared information may be used multiple times by the program, for example, a packet header may be looked at several times by different steps in the packet-forwarding algorithm. Since this piece of information is not cached, it must be read from the shared memory once for each step, which is inefficient. This inefficiency may be partially overcome by explicitly reading the information only once and then storing it in a processor register or in non-shared, and therefore cacheable, memory. However, when written in a high-level-language program, these explicit operations may or may not be preserved by the high-level-language compiler. For example, the compiler may decide that these operations are redundant and remove them, leading to incorrect program operation.
Second, accesses to non-cacheable memory may not use the most efficient mode of microprocessor bus operation. For example, some MIPS R3000-family microprocessors, such as the R3052 and R3081 from Integrated Device Technology, Inc., use an efficient 4-word burst mode to read cacheable memory locations, but use a less efficient single-word mode to read non-cacheable locations.
Another solution to the cache inconsistency problem is to allow programs to encache shared information once, but then to explicitly flush (mark invalid) the cached information after it has been used. This guarantees that the cache will "miss" when the processor next attempts to read new information at a shared memory location that was previously encached. Disadvantages of this approach include program inefficiency (extra instructions are needed to flush the cache) and awkwardness (a high-level language may not be able to generate the low-level instructions needed to flush the cache).
Another solution to the cache inconsistency problem is called bus snooping. In the bus-snooping method, each microprocessor which shares the memory monitors all other microprocessors to detect memory write operations to locations which the microprocessor has encached. If any other microprocessor performs a write to an encached location, the first microprocessor invalidates its cache so that the next read reference to that location will cause a cache miss.
Bus snooping has the disadvantage of requiring additional bus-snooping and cache-monitoring logic to be present in each microprocessor, which can increase the cost and/or decrease the performance of the microprocessor. Also, bus snooping may not be supported at all by some classes of commercially available non-microprocessor devices, such as the network controllers mentioned previously.