1. Field of the Invention
This invention relates generally to computer systems having multiple processors with cache memories, and, more particularly, to a method and apparatus for maintaining cache coherency in a computer system having multiple processor buses.
2. Description of the Related Art
To a great degree, the speed of a computer system is determined by its processing bandwidth, and the amount of data it may readily retrieve. The processing power is determined, in part, by the number of processors in the computer system and the speed of those processors. Typically, data is read from a fixed storage device (e.g., hard disk) and stored in a main memory device in the computer system for later retrieval by the processor(s). Many transactions in the computer system are directed toward reading or writing to the main memory. To increase the total capacity of the main memory, it is common to divide the main memory into one or more separate memory banks, each having an associated memory controller. Usually, the memory controllers are coupled to a single, shared memory bus.
In many computer systems, multiple processors are used to increase system performance. One or more of these processors typically have a cache memory. A cache memory maintains a local copy of selected lines of data contained in the main memory for rapid retrieval. Cache memories are typically implemented using fast, static random access memories (SRAM), and main memories are typically implemented in higher-density, but slower, dynamic random access memories (DRAM). Because two or more processors may be involved with executing a single software application, the same memory lines may be stored in the cache memories of different processors.
Past computer systems have included multiple processors and multiple memory controllers coupled to the same shared bus. As the frequency of the bus increases, the number of electrical loads supportable on the bus decreases. To maintain or increase the number of processors while increasing the bus speed, the processors are split across multiple processor buses. Due to the segregation of processors onto separate processor buses, it is necessary to maintain the coherency of the processor cache memories across the buses.
In general, coherency is maintained by identifying the cache state for every cached line in the system. Cache line states may be invalid, shared, or owned. Invalid states indicate that the line is not cached anywhere in the system. The shared state indicates that the line may be present in one or more processor caches. The owned state indicates that the line may be in an exclusive or modified state in one of the caches.
One technique for maintaining cache coherency involves the use of a cache directory associated with each of the memory controllers. Each memory controller accesses mutually exclusive address ranges. The cache directory stores the status of each of the cacheable memory lines governed by the associated memory controller. Due to the large number of cacheable lines, the cache directory is typically large, and is generally implemented in higher density, but slower, DRAM. Faster memories, such as SRAM, are cost-prohibitive due to the required capacity of the cache directory. A typical DRAM access may require about 16 clock cycles, while the corresponding SRAM access may take only about 3 clock cycles. As a result of the slower access time, cache directory accesses introduce significant latency to memory accesses.
Another technique uses a local bus snoop filter for each of the processor buses. The local bus snoop filter is checked for each cacheable memory request. Also, the local bus snoop filters associated with each of the other processor buses (i.e., remote bus snoop filters) must be checked. This, in effect, multiplies each snoop request into N-1 snoop requests, where N is the number of processor buses in the computer system. The local bus snoop filter technique is also susceptible to contention when multiple snoop requests are received from local bus snoop filters associated with other processor buses at or near the same time. As a result, coherency checks may have to be placed in a queue and evaluated in order. Because each check may require several clock cycles to complete, the queue may add significant latency to a particular request.
Another coherency maintenance technique involves the use of a tertiary cache between the local processor buses and the shared memory bus. To be effective, the size of a particular cache level is typically almost an order of magnitude greater in size than the previous level. Because current microprocessors can support secondary caches on the order of 2 to 4 MB, the tertiary cache would need to be about 16 to 32 MB to be effective. Such large, high-speed memories are prohibitively expensive. Also, software applications with random memory accesses or high data migration would tend to saturate the shared memory bus. Moreover, the high number of loads necessary to support the tertiary cache on a high frequency bus may generate the same electrical problems leading to the need to segregate the processors.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.