The use of cache subsystems in microcomputer systems leads to a number of attractive operating advantages. A microcomputer system employing a cache subsystem is in effect a dual bus microcomputer. The CPU and cache subsystem are connected together via what can be referred to as a CPU local bus. Separate from the CPU local bus is a system bus to which other devices (I/O devices, additional memory, etc.) can be connected. The presence of the cache subsystem relieves the system bus from any read memory access to the extent that the information sought is found in the cache subsystem. Because not all desired information will be found in the cache subsystem, and write operations are usually directed to both the cache subsystem and to memory, there must of course be some connection between the system bus and the CPU local bus. When that connection is a latched buffer, additional advantages, specifically posted write operations, are possible.
More particularly, in any write operation, access will be necessary to memory (which is not on the CPU local bus). The information (data and address) for the write access is initially placed on the CPU local bus where it can be used for writing to the cache subsystem. Since the interface between the CPU local bus and the system bus is a latched buffer, the same data can be latched into the buffer. Once that information is latched onto the buffer, it need no longer be driven by the CPU. Thus a posted write cycle relies on the fact that the address and data information for a memory write operation is available from the latched buffer. Completion of that write cycle does not require the attention of the CPU. Thus in a "posted" write, the data and address for the write cycle is latched into the buffer interfacing the CPU local bus and system bus whereafter the CPU can go on to initiate a subsequent cycle. The cache control system (including the cache controller) can then monitor completion of the write to memory.
Microcomputer systems comprising an 80386 processor and an 82385 cache controller are arranged to take advantage of posted write operations in exactly this fashion.
The 80386, and the signals it generates, are described in "Introduction to the 80386 Including the 80386 Data Sheet" from Intel (April 1986). The 82385 cache controller, and the signals it generates, are described in "82385 High Performance 32-Bit Cache Controller", available from Intel (July 1987).
Another attractive feature of the 80386 is its capability for operating with what is referred to as dynamic bus sizing. The 80386 is nominally a 32-bit machine, i.e. its data bus is 32 bits wide. Dynamic bus sizing refers to the capability of the 80386 processor to cooperate with not only 32-bit devices (memory, I/O, etc.) but with devices which do not have 32-bit capability, i.e. devices that cannot transfer 32 bits of data in a single cycle. There are for example a wide variety of memory and/or I/O devices which are 16-bit devices, i.e. they can transfer no more than 16 bits of data on a given cycle. Moreover, there are 8-bit I/O, memory and other devices. Such devices can transfer only eight bits of data on a given cycle.
The 80386 includes provision for a BS16 signal. When that signal is asserted, it has the following effect. In the event the 80386 has generated a 32-bit cycle, i.e. it has generated and/or expects to accept 32 bits of data, the assertion of the BS16 signal indicates to the 80386 that it is not operating with a 32-bit device. Assertion of the BS16 signal will, during the 32-bit cycle, automatically initiate the generation of a second cycle. By convention, any 16-bit device is arranged to transfer a predetermined group of 16 bits from the 32-bit data bus. The 80386, on the second cycle generated by the presence of the BS16 signal will place that group of 16 data bits which in the first cycle was not in the predetermined group of data bits onto a predetermined group of data lines which is associated with the predetermined group of 16 bits. Accordingly, in the first of the two cycles, the 16-bit device will transfer a given set of 16 bits of the 32 bits data space. In the second cycle, the 16-bit device will transfer another 16 bits of data so that, taken together, the two 16-bit cycles transfer 32 bits.
There is, however, an incompatibility between posted write cycles and dynamic bus sizing. That incompatibility arises for the following reason. Assume that the 80386 initiates a posted write. The data and address for the posted write cycle are latched into the buffered interface between the CPU local bus and the system bus. Although the write cycle is not yet completed, a ready signal is returned to the 80386. This simulates completion of the cycle so that the 80386 can initiate a following operation. Since the BS16 signal (which is returned to the 80386 to indicate the size of the device with which it is operating) is generated by the device, that signal is not generated until the device has recognized its address. Continuing with the example, and assuming that the device for which the posted write cycle is destined is in fact a 16-bit device, by the time the BS16 signal is returned to the 80386, it has already gone beyond the given operation and is engaged in the following operation. The 80386 therefore cannot generate the second, necessary cycle for the 16-bit device.
Therefore it is an object of the invention to selectively post write cycles. Since a posted write cycle is identified with an early generation of a ready signal to the 80386, the invention provides logic to generate the ready signal to the 80386 only when a posted write is appropriate. In accordance with the invention, all devices with which the 80386 can interact (I/O, memory, etc.) are classified as either cacheable devices or non-cacheable devices. The address assigned to all the devices has a tag which indicates whether the device is cacheable or non-cacheable. In accordance with the invention, an address decoder is provided on the CPU local bus. The address decoder responds to the asserted address on the CPU local bus to assert a NCA signal when the access is to a non-cacheable device.
82385 High Performance 32-Bit Cache Controller provides, at Section 1.3.3 that: PA0 "The 82385 allows the system designer to define areas of main memory as non-cacheable. The 80386 address bus is decoded and the decode output is connected to the 82385's non-cacheable access (NCA#) input. This decoding is done in the first 80386 bus state in which the non-cacheable cycle address becomes available. Non-cacheable read cycles resemble cacheable read miscycles, except that the cache and cache directory are uneffected. Non-cacheable writes, like all writes, are posted." PA0 "NCA# allows the designer to set aside a portion of main memory as non-cacheable. Potential applications include memory-mapped I/O and systems where multiple masters access dual ported memory via different buses."
"Non-cacheable cycles fall into one of two categories: cycles decoded as non-cacheable, and cycles that are by default non-cacheable according to the 82385's design. All non-cacheable cycles are forwarded to the 82385 local bus. Non-cacheable cycles have no effect on the cache or cache directory.
And under Section 3.3.2, the document indicates:
Furthermore, while the 82385 cache controller is arranged to generate the ready signal, that signal is not coupled to the 80386. Rather, that signal is coupled to logic means in accordance with the present invention. That logic means, depending upon a variety of other asserted signals, will generate a CPUREADY signal (to replace the ready signal) only when appropriate. More particularly, the logic means of the present invention generates the CPUREADY signal to allow posted write cycles only when the access is to a cacheable device, i.e. in the absence of the NCA signal. On the other hand, in the presence of the NCA signal, the logic means withholds generation of the CPUREADY signal so that in effect posted write operations do not occur.
Accordingly, in accordance with one feature, the invention provides a microcomputer system having:
a CPU local bus connecting a CPU and a cache subsystem, said CPU having means for posted write operations in response to receipt of a CPU ready signal prior to completion of a write operation,
system bus means connecting a random access memory and a plurality of addressable functional units, said system bus means returning a ready signal at completion of a write operation,
means for bidirectionally coupling said system bus and said CPU local bus, and
logic means for selectively preventing posted write operations, said logic means comprising:
a) address decoder means coupled to an address bus component of said CPU local bus for generating a NCA signal indicating assertion of an address on said CPU local bus outside an address range associated with said cache subsystem, and
b) means responsive to said NCA signal for withholding said CPU ready signal until receipt of said unit ready signal from one of said addressable functional units.