Field of the Invention
The present invention relates to cache coherency control. More particularly, it relates to a writethrough cache coherency control module and method for computer systems having a bus that does not support caches.
A cache memory is a small very fast memory used to store frequently used instructions and data. Cache memory is a tradeoff between speed and cost. Ideally, a computer would have main memory that is as fast as the CPU. But while such memory exists, it is much more expensive than main memory. Fortunately, a phenomenon known as the locality of execution principle exists which makes caching possible. This principle states that the CPU tends to use the same memory locations regularly. In general, the CPU follows the 80/20 rule which states that 20% of the memory addresses will be used 80% of the time. This being the case, a small, very fast and expensive memory can be used in conjunction with a much larger, slower and inexpensive main memory to gain most of the performance of an all high speed main memory, but at a small fraction of the cost.
A typical cache memory consists of very fast static RAM and a very fast cache controller. In operation, when the CPU requests data from main memory, the cache controller checks to see if the data is in the cache. This check is done with very fast logic. Ideally, the speed of the cache is tuned to the speed of the CPU, and with on board caches this happens automatically. If the requested data is in the cache, it is delivered much more quickly than if it were in main memory. This situation is called a cache hit. If the requested data is not in the cache, the CPU must access the slower main memory for the data. This is called a cache miss. Data retrieved during a cache miss is also written to cache for future use. In order to write new data into cache, other data must be eliminated. This is called a line replacement in cache. The cache controller performs this function and attempts to maintain the most frequently used data in cache.
However there are architectural problems with a cache memory. A first problem occurs when the CPU has modified data in cache memory and an external device needs to read that data. The CPU has copied the data from some locations in main memory into its cache memory. The CPU may then be using and modifying the data in cache. There are then certain locations in main memory with data which may be different than the data in the corresponding locations in cache. This doesn't matter for the CPU since it always looks first in the cache and thus has the most current data. However, if an external device (one other than the CPU) needs the same data, it would read "stale" data.
A second problem occurs when the CPU has modified data in the cache and an external device writes new data into the corresponding locations in main memory. In this case, the data in the CPU cache becomes stale.
These two situations give rise to the general problem known as cache coherency. That is, how does the system insure that the CPU and all external devices are using the same data when a cache memory is employed.
The solution is of course simple when the CPU directly supervises all accesses to main memory. But, as will be discussed later, modern, high performance architectures allow external devices to access main memory directly.
The solution for the external device read problem is related to the writeback strategy. That is, whenever the CPU modifies the data in its cache, the issue is how does it write back into the main memory. There are two approaches known as writethrough and writeback.
In a writethrough cache scheme, whenever the CPU modifies any data in cache, it will immediately go to the same location in memory and write the new information there. The CPU will never modify data in the cache and leave it unmodified in the main memory. Thus a writethrough memory helps the CPU performance by speeding up the read cycle. In the cacheless case the CPU had to both read and write from main memory. With a writethrough cache, the CPU only has to write to main memory. But this approach assures the system that any device which accesses main memory will have access to exactly the same data as in the cache.
In a writeback cache scheme, the CPU does not modify the main memory in every memory cycle. Rather, the CPU keeps operating out of cache for so long as an external device does not seek access to the memory locations that have been cached.
In order to accomplish this the CPU monitors the address and control busses looking for requests from external devices for addresses in main memory that have been cached. This is known as "snooping". If the CPU detects such a request and the data in cache has been modified, the CPU issues a command known as an "abort" or a "backoff". A backoff command activates a protocol that is built into both the external device and the CPU. When the external device receives a backoff command, it stops the operation it was doing and gives control of the bus back to the CPU. The CPU then writes the modified data from the cache to the corresponding location in main memory and then gives the control of the bus back to the external device. The device will then re-execute execute the same cycle. With this protocol, the external device will get the updated data. The advantage of a writeback cache is that the CPU does not have to writethrough to main memory on every memory cycle. It need writeback to main memory only when there is a request from an I/O device. And since the number of I/O requests is tiny compared to CPU cycles, the writeback cache is a high performer. Thus, in this case, main memory is updated only as actually required.
In the case of an external I/O device seeking to write to a main memory address that has been cached and modified there by the CPU, an action known as "invalidation" is invoked. Since the CPU is always monitoring or "snooping" on the addresses that it has in the cache, each time that the CPU detects an address in main memory that is to be modified and which address has been cached, the CPU knows that there will be new data available to it, and the CPU invalidates that portion of the cache containing the data being modified in main memory. The CPU then accesses main memory to get the new data. Thus, an external device writing to memory is handled through snooping and invalidations. This scheme works with either writethrough or writeback cache coherency schemes.
Thus in a conventional implementation of a cache memory, the bus is designed to support one of the two coherency schemes: writethrough or writeback. In a writethrough scheme, the bus is designed to have the signals to be snooped available. In a writeback cache, the bus needs to not only have the signals to be snooped available, but also to have the backoff lines on the bus and the extra logic somewhere.
The early 8088 brand microprocessor through the 386 brand microprocessor manufactured by Intel Corporation did not have cache memory capability. And, the busses used in computers incorporating these microprocessors, did not support cache memory. That is, they did not contain snooping or backoff hardware.
Newer microprocessors manufactured by Intel Corporation, however, have included cache memory as part of their performance enhancement. For example, the Intel i486 brand microprocessor contains writethrough cache memory capability and the Intel Pentium brand microprocessor contains writeback cache capability. Thus, all new computers using these microprocessors contain some sort of cache memory.
However, there is a very large installed base of 386 brand microprocessor based computers, and some significant number of the users of such computers could have an interest in upgrading their computer by retrofiting them with either a i486 or Pentium brand microprocessor. This would be a low cost approach to improved performance. But, in order to do this and take at least some advantage of the caching built into the new microprocessors, it is necessary to develop a way of using cache memory on a computer bus that does not have the hardware to support cache memory.
The applications referenced in the Cross-reference to Related Applications section of this application describe two ways to accomplish this task. However, in order to fully understand the referenced applications and how the present invention relates to them, it is necessary to understand something of the basic computer architectures used in the IBM PC and IBM compatible (e.g. the MS DOS compatible) PC marketplace.
A computer architecture defines at the highest level how a computer system is organized for handling information. The basic elements of all computer systems are a CPU, memory, input/output ("I/O") devices and peripheral devices. All of these devices are interconnected by a bus.
At its most basic level, a bus consists of a group of conductors carrying signals that allow the system to communicate among its various devices and includes conductors carrying signals that allow the system to uniquely address memory locations as well as all addressable devices in the system. This part of the bus is called the address bus.
Once a device or memory location has been selected by the address bus, data is passed to and from the device or memory along a bi-directional data bus. The width of the data bus, for example 16 or 32 bits, is one indicator of computer performance.
Finally, a control section is needed to indicate when the data and address signals are valid since at all times after the couputer is powered up there are signals on the bus: either 0 or 5 volts (or 3.3 volts in some technologies). Thus, there must be a mechanism to tell the rest of the system when the signals on the data and address bus are valid. A control bus performs this function. The control bus indicates that the signals on the address bus are now valid so all addressable components are to decode the signals on the address bus to determine which device is being addressed. Similarly, when a device responds, it sends a signal on the control bus which indicates that the signals on the data bus are now valid so that the component which is programmed to receive the data can now do so.
Taken together, these three buses are referred to collectively as the "CPU bus" or just "the bus".
The first architecture to gain prominence in the IBM PC and IBM compatible PC marketplace was the Industry Standard Architecture (the "ISA"). This architecture is used on Intel 8088 through i486 brand microprocessor based computers and has by far the greatest installed base of users.
FIG. 1 illustrates the ISA at its most basic level. Referring to FIG. 1, the computer system consists of a CPU 10 connected to all portions of bus 12. The address and control portions of bus 12 are connected to main memory controller 14. A main memory array 16 is connected to the data portion of bus 12 and to memory controller 14 through interconnect 18. On each address cycle, memory controller 14 decodes the address signals to determine if there is an address in main memory to be accessed. If a main memory access is required, memory controller 14 sends control signals to main memory array 16, which causes either a read or a write of data to the data portion of bus 12.
In addition to main memory, the CPU must communicate over the bus to peripherals such as keyboards, printers, monitors and disk drives, the direct memory access (the "DMA") controller, and various add in cards. This is accomplished by means of addressable registers called I/O devices. Each I/O device is also connected to the data and control portion of the bus. In the case of the 386 and i486 brand microprocessors, addresses are logically organized into two basic areas called "spaces": a separate I/O map called an I/O space which has 64K possible addresses and a memory space of up to 4 gigabytes. A control signal which when asserted indicates that the address to be accessed is in main memory, and when not asserted indicates I/O space. The I/O space contains the addresses of all devices that perform I/O functions. Each I/O device has a unique address in I/O space and monitors the address bus for commands. When addressed, the device decodes the command and responds to it with either a read or write operation.
In MS DOS based systems, there is a special section of memory space between address 640K and 1 meg called upper memory. Portions of this address space are reserved for specific functions such as (BIOS Basic Input Output Services) and video memory. However, there are "holes" in this space that are not reserved. These holes are called non-standard memory space. Addresses in non-standard memory space are occasionally used by system manufacturers for I/O devices.
Still referring to FIG. 1, a DMA controller (sometimes referred to as the DMA) 20 is connected to all three portions of bus 12. In ISA, the DMA controller is the device that allows all other devices to take control of the bus to access memory. The DMA is connected to all of the I/O devices in the system and acts like a gate in that an external device that seeks access to memory must first access the DMA controller.
Early computer systems had a single DMA with 4 channels: one channel for each I/O device to be controlled. To increase the number of I/O channels, a second DMA 22 was added by convention through channel 4 of DMA controller 22. This effectively increased the number of channels to 7. DMA controller 20 which is connected directly to the bus is referred to as the "master" DMA controller, and DMA controller 22 is referred to as the "slave" DMA controller since the master controls the slave, access to the bus through its channel 4. In FIG. 1, a first external device 24 and a second external device 26 may be connected to bus 12 and to channels 3 and 1 respectively of slave DMA controller 24. These devices could be for example a floppy or hard disk or a fax card and will have addresses in I/O space.
In operation, the CPU commands the DMA to monitor all of its channels to look for devices in the system that are trying to transfer data. If a channel is active, the CPU programs the DMA with the address in memory at which the data transfer is to start; the device from which the data is to be supplied, the type of operation, read or write, and the number of bytes of data to be transferred. When the controller has completed the data transfer, it sends an interrupt signal to the CPU indicating task completion. During the period after the CPU has programmed the DMA controller and the receipt of the interrupt indicating completion, the CPU is free to do other tasks. This vastly increases the productivity of the CPU.
In order to operate with a slave, the master DMA must be able to take control of the bus from the CPU and pass it through to the slave DMA. This pass through mode is called "cascade"
The DMA can in the same way as passing control to another DMA, pass control to an external device. This feature makes another class of direct access devices called "bus master devices" feasible. Historically, bus master devices were desirable since the DMA controller operates at 4 mhz. This is much slower than the 33, 50 or now 66 mhz CPU's that evolved after the ISA was selected. Thus, the DMA controller was slowing down the newer computer systems. But, backward compatibility requires that the DMA continue to operate at 4 mhz. In response, a bus master device was devised to go around the DMA controller. This led to the "bus master" device.
A bus master device is one that can acquire the bus and read and write to main memory independent of either the CPU or the DMA controller. Examples of bus master devices are SCSI hard disk controllers and fax cards. In operation, upon a request from a bus master device the DMA will take control of the bus but it will allow the bus master device to determine the address in memory that is to be written or read from. The CPU need not program the address into the DMA. Each bus master device has a unique address in the I/O space. The CPU does a write or read to that address to initiate the device, otherwise it will do nothing. The CPU monitors the status register of the device to determine if the transaction was successful. In FIG. 1, a bus master device is schematically illustrated as blocks 28 connected directly to bus 12 and also to channel 6 of DMA controller 20.
ISA based systems are not the only systems using microprocessors manufactured by Intel Corporation. While the ISA bus is by far the most widely used, it has several deficiencies. In particular, it is not sufficiently well defined to allow high performance add in devices to be used effectively. To solve this problem, IBM developed a totally new bus called the micro channel architecture (MCA) bus.
The MCA bus has several enhancements over the ISA bus. One enhancement is a much faster data transfer. A second enhancement is a dedicated bus controller that allows external devices to take control of the bus and access main memory directly.
In ISA, the DMA controller is the unit that allows any other device to get control of the CPU bus. The DMA controller acts like a gate in that an external device must talk to the DMA controller to get access to main memory. Thus everything done by the DMA is visible to the CPU.
With MCA there is no gate concept. The bus controller is not an addressable device as is the DMA controller. Rather, it is a set of logic that receives requests from any device, other than the CPU that can talk to memory and arbitrates bus access on its own. This architecture is illustrated in FIG. 2. Referring now to FIG. 2, the computer system has a MCA bus 13 which includes address, data and control signals. A bus controller 41 is tied to bus 13 by signal carrier 43. Bus controller 41 manages the transfer of information between external bus master device 44 and an external non-bus master device 46 through DMA 48 and a central processing unit (CPU) 10. From FIG. 2, it can be seen that all external devices, bus master, non-bus master and including DMA controller, achieve access to the bus through bus controller 41.
ISA in cascade mode and MCA have a similar problem. The CPU doesn't know which device is going to take control of the bus. In non-cascade mode the CPU, knows exactly where the DMA will allow access to memory. However, this cannot be done for a device in cascade mode or in MCA because that device can use nearly any of the addresses in the 64K I/O space. The CPU cannot keep track of this address. Thus a bus master card in both ISA and MCA can read or write to memory addresses unknown to the CPU.
With the foregoing background, it is now possible to examine the prior art cache coherency schemes. Ghori provides a cache coherency module ("CCM") added to the microprocessor chip that snoops the addresses put on the bus. When the CCM detects that the DMA is programmed to do a write to main memory, the entire cache is flushed and the page in main memory (in MS DOS systems, main memory is divided into pages of 64K bytes) that is being written to is kept non-cacheable until the data write is complete. Completion of the data write is determined by monitoring the application software's interrogation of the DMA status register with a software initiated interrupt. The cache is turned off completely when cascade mode is implemented since the CPU does not know which areas of memory may become incoherent. For the same reasons, the Ghori scheme does not work with MCA systems either. In addition, there is a problem with determining the end of a data transfer with the Ghori scheme. Ghori relies on an access by the application software to the DMA status register to signal the end of a data transfer. But, not all application software actually checks the status of the DMA controller for completion of a data transfer. Thus, in cases where the application software does not read the DMA status register, either the entire cache remains off indefinitely or the non-cacheable pages of memory would remain uncacheable for an indefinite period of time and performance would suffer. In addition, the Ghori scheme requires considerable additional logic and memory to keep track of the noncacheable pages. This causes an increase in die size which translates into a more expensive device. Finally, Ghori does not provide hardware to turn the cache memory on after a reset not to specify the parts of main memory that are cacheable.
The Kulkarni application cross-referenced above deals with writeback cache coherency for systems having busses that either support writethrough caching only or no caching at all on the bus; however this application does disclose a method of dealing with cacheable memory size for writeback cache control systems.