1. Field of the Invention
The present invention relates to computer systems which include a cache memory subsystem and, more particularly, to a method and apparatus for facilitating the handling of locked cycles within such systems.
2. Description of Related Art
Cache memories are often included in computer systems and comprise a small highspeed "scratch-pad" memory which is maintained near an associated processor to speed up the fetch and store time for data used most frequently by the processor. Since data is swapped between the cache memory and the main memory when needed, consistency must be maintained between the data stored in each. A consistency problem appears when main memory is accessed by devices other than the processor. For this reason, when a device other than the processor writes into the main memory at an address corresonding to an address in the cache memory, that address in the cache memory is rendered invalid by the cache controller until the inconsistency is removed.
In multi-processor computer systems there is more than one processor which communicates through a common data bus. While each processor may have its own local, or cache memory, and operate independently of others, they all have access to the main system memory. In such systems, the data stored in main memory must be shared by the several processors. In the case where common data is needed by several processors, there is a possibility that two processors may access and modify the same data packet, resulting in two different versions. A hardware system to prevent this provides a lock signal which a memory arbiter monitors to inhibit any other processor from using that data until the first processor returns the updated version to main memory.
A multitude of prior art electronic data systems include the Intel 80386 microprocessors and 82385 cache controllers. It is useful to examine the interface between such processors and cache controllers in detail in order to more fully understand and appreciate how the data consistency problem, discussed above, has been conventionally handled.
By itself, a microprocessor system bus structure consists of the physical microprocessor address, data and control busses. The local address and data busses are buffered and/or latched to become the "system" address and data busses and the local control bus is decoded by bus control logic to generate the various system bus read and write commands.
A cache memory system monitors each one of the microprocessor memory references to see if the address of the required data resides in the cache. If the data does reside in the cache (a "hit"), it is immediately returned to the microprocessor without incurring the wait states necessary to access main system memory. If the data does not reside in the cache (a "miss"), the memory address reference is forwarded to the main memory controller and the data is retrieved from main memory. Since cache hits are serviced locally, a processor operating out of its local cache memory has a much lower "bus utilization," which reduces system bus bandwidth requirements, and makes more bus bandwidth available to the other bus masters of the system. This is significant because, as is well known to those skilled in the art, the bus in the computer, that is, the communications channel between the CPU and the system memory and storage devices, is a principal bottleneck. Virtually all instructions and all data to be processed must travel this route at least once. Thus, in order to maximize system performance, it is essential that the bus be used efficiently.
As should be fully appreciated by those skilled in the art, the addition of a cache control into a computer system is structured so as to separate the microprocessor bus into two distinct busses: the actual microprocessor bus and the cache controller local bus. The cache controller local bus is designed to look like the front end of a microprocessor by providing a cache controller local bus equivalent to all appropriate microprocessor signals. The system interconnects to this "micro-processor like" front end just as it would to an actual microprocessor. The microprocessor simply sees a fast system bus, and the system sees a microprocessor front end with a low bus bandwidth requirement. The cache subsystem is transparent to both. Transparency, in the data communications field, refers to the capability of a communications medium to pass, within specified limits, a range of signals having one or more defined properties. It should be noted that in such systems the cache controller local bus is not simply a buffered version of the microprocessor bus, but rather, is distinct from, and able to operate in parallel with, the microprocessor bus. Thus, other bus masters, that is, supervisory systems of one kind or another residing on either the cache controller local bus or the system bus, are free to manage the other system resources while the microprocessor operates out of its cache.
As previously mentioned, a computer or other electronic data processing system can comprise multiple microprocessors. One such system is discussed in detail in the related applications further identified in the cross reference to related application section above. There are, of course, many other examples. As is known to those skilled in the art, cache controllers such as the Intel 82385 can be programmed for either master or slave mode operation. When such a cache controller is programmed in slave mode, it drives its local bus only when it has requested and subsequently been granted bus control. This allows multiple microprocessor/cache controller subsystems to reside on the same cache controller local bus.
In multiple processor systems which share a common memory and access bus structure, certain controls are necessary to preserve memory integrity. For example, in bus cycles such as "read-modify-write" it is necessary that a master have exclusive use of the bus for both the read and write cycles to ensure that another master does not access the same data in main memory while that data is being modified. For this reason, such cycles are configured as "locked cycles" in which the master has exclusive use of the bus until its entire cycle has been completed.
Other aspects of microprocessor and cache interfacing relate to cache coherency. Ideally, a cache contains a copy of the most heavily used portions of main memory. To maintain cache "coherency" is to make sure that the data contained in this local copy is identical to the data located at the corresponding addresses in the main memory. In a system where multiple masters can access the same memory there is always a risk that one master will alter the contents of a memory location that is duplicated in the local cache of another master. Such a cache contains "stale" data.
Cache controllers such as the Intel 82385 preserve cache coherency via "bus watching" or "snooping," a technique that neither impacts performance nor restricts memory mapping. A cache controller that is not currently the bus master monitors system bus cycles, and when a write cycle by another master is detected (a snoop), the system address is sampled and used to see if the referenced location is duplicated in the cache. If so (a snoop hit), the corresponding cache entry is invalidated, which forces the associated microprocessor to fetch the up-to-date data from main memory the next time it accesses this modified location.
In operation, a microprocessor/cache controller system executes a number of cycles. These cycles include memory code and data read cycles, memory write cycles, non-cacheable cycles and local bus cycles. When the microprocessor initiates a memory code or data read cycle, the cache controller compares the high order bits of the microprocessor address bus with the appropriate addresses stored in its own chip directory. If the cache controller determines that the requested data is in the cache (a "hit") it issues the appropriate control signals that direct the cache to drive the requested data onto the microprocessor data bus, where it is read by the microprocessor. If the cache controller determines that the requested data is not in the cache (a "miss"), the request is forwarded to the cache controller local bus and the data retrieved from main memory. As the data returns from main memory, it is directed to the microprocessor and also written into the cache. Concurrently, the cache controller updates the cache directory so that the next time this particular piece of information is requested by the processor, the cache controller will find it in the cache and return it with zero wait states.
With regard to memory write cycles, the cache controller's "posted write" capability allows the majority of processor memory write cycles to run with zero wait states. The primary memory update policy implemented in a posted write is the traditional cache "write through" technique, which implies that the main memory is always updated in any memory write cycle. If the referenced location also happens to reside in the cache (a write hit), the cache is updated as well.
Non-cacheable cycles fall into one of two categories: cycles decoded as non-cacheable and cycles that are by default non-cacheable according to the cache controller's design. All non-cacheable cycles are forwarded to the cache controller local bus. Non-cacheable cycles have no effect on the cache or cache directory. It should be appreciated that cache controllers are frequently designed so that certain areas of main memory are non-cacheable. Certain cycles often defined as non-cacheable include I/O cycles, interrupt acknowledge cycles, and halt/shutdown cycles.
Microprocessor local bus cycles are accesses to resources on the microprocessor local bus rather than to the cache controller itself. The cache controller simply ignores these accesses: they are neither forwarded to the system nor do they affect the cache.
To fully understand and appreciate the present invention, it is helpful to be aware of several additional aspects of conventional microprocessor/cache controller systems. First, it should be known that microprocessor outputs include various bus cycle definition signals. In the case of the Intel 80386, these bus cycle definition signals are W/R#, D/C#, M/IO# and LOCK#. These four state outputs define the type of bus cycle being performed. W/R# distinguishes between write and read cycles. D/C# distinguishes between data and control cycles. M/I0# distinguishes between memory and I/O cycles. LOCK# distinguishes between locked and unlocked bus cycles. The primary bus cycle definition signals are W/R#, D/C#, and M/I0#, since these are the signals driven valid as the ADS# (Address Status output) is driven asserted. The LOCK# is driven valid at the same time as the first locked bus cycle begins, which due to the address pipelining, could be later than ADS# is driven asserted. The LOCK # is negated when the READY# input terminates the last bus cycle which was locked.
A second aspect which should be appreciated is the actual microprocessor/cache controller system interface which can be thought of as three distinct interfaces: the microprocessor/cache controller interface; the cache interface; and the cache controller bus interface. Heretofore, the microprocessor/cache controller interface has been considered to be a straightforward connection. With special regard to the cycle definition signals, discussed above, the cache controller is directly connected to those microprocessor outputs. The cycle definition signals are decoded by the cache controller to determine the type of cycle being executed by the microprocessor. If the cycle being executed by the microprocessor is a locked cycle, the cache continually renders the cycle non-cacheable for reasons of cache integrity discussed above.
Prior personal computer systems employing IBM compatible architecture and using the Intel 80386 microprocessor and 82385 cache controller have directly connected the "LOCK#" pins on these two devices. This connection was deemed required because locked cycles need to be non-cacheable in order to maintain the integrity of read-modify-write cycles in multiprocessor systems. However, making all locked cycles non-cacheable requires all locked cycles to access system memory rather than local cache memory. Hence, the performance of locked cycle instructions is poor relative to non-locked cycles. A similar system is shown in U.S. Pat. No. 4,843,542 which discloses disabling communication between a cache and all but one of the processors of a multiprocessor system while a single processor is involved in a read-modify-write process.
The performance penalty exacted from multi-processor systems in order to maintain cache integrity is not required for uniprocessor systems. For single processor systems, locked cycles can be cached because there is no other processor to potentially interfere with the data during the cycle, and system performance can be substantially improved. By making the "LOCK#" connection between the microprocessor and the cache controller jumperable, a system can be configured for maximum performance for uniprocessors while maintaining operational integrity for multiple processors.
Stated another way, the known prior art concerning "LOCK#" in a system using an Intel 80386 microprocessor and 82385 cache controller was to directly connect the "LOCK#" output on the microprocessor to the "LOCK#" input on the cache controller. This connection assured that locked cycle references were not cached and would operate correctly in any environment.
However, in situations where locked cycles could be cached and produce improved system performance, i.e., in systems having only one processor, the prior art configurations did not allow for taking advantage of those aspects.