1. Field of Invention
This invention relates to computer systems, and more particularly to apparatus for buffering data writes from a CPU to a memory subsystem.
2. Description of Related Art
A typical computer system is divided into several subsystems, including a central processing unit (CPU) for executing instructions, and a memory subsystem for holding instructions and data. The CPU obtains information from the memory by issuing a read request, and writes information to memory by issuing a write request. If the memory is fast enough, either type of memory access request is executed immediately. The request is completed by the time the CPU is ready to continue its work. In many computers, however, the memory subsystem is not as fast as the CPU. That is, once the CPU issues a memory access request, it must enter a wait state or stall state for the request to be completed before proceeding. The problem appears frequently with highly pipelined CPUs, such as those that are used in many reduced instruction set computers (RISCs). In these CPUs, several cycles of a very fast clock may pass between the time an instruction enters the pipe to the time its execution is completed, but a new instruction must be fetched and/or a new memory access request is issued on every cycle of that very fast clock.
Many computers reduce the severity of this problem by implementing memory in two levels: a large, relatively slow but inexpensive main memory, and a small, fast cache memory. A cache memory takes advantage of the "principle of temporal locality," i.e., the property inherent in most computer programs wherein a memory location referenced at one point in time is very likely to be referenced again soon thereafter. In a cache-based computer architecture, the CPU first attempts to find needed instructions and data in the cache, which is fast enough to maintain pace with the CPU. Only if the information is not in the cache is a read request issued to main memory. When the requested information arrives, it is both provided to the CPU and written into the cache (overwriting some previous entry) for potential future use. On a data write from the CPU, either the cache or main memory or both may be updated, it being understood that flags may be necessary to indicate to one that a write has occurred in the other. The use of a cache memory improves the overall throughput of the computer because it significantly reduces the number of wait states which the CPU must enter. Wait states are still necessary, however, when an access to main memory is required.
The speed of a main memory read request is critical to the throughput of a computer system because the CPU in most designs cannot continue operating until the requested information is received. It has been recognized, however, that the speed of a memory write request is not as critical. The CPU no longer needs the data once it is sent out to main memory, and unless the data is needed by some other device which shares main memory, there is in theory no reason why the data need actually be written until the next time the CPU issues a request for it. This can be used to advantage by inserting a write buffer subsystem in the bus between the CPU and main memory. Such a subsystem passes read requests to the memory immediately, but passes write requests to the memory only when the bus is not already in use. Write requests are instead buffered in an internal buffer and held until the bus is available. A write buffer subsystem typically includes logic to determine whether any arriving memory read request is requesting data still in the write buffer. If so, these systems temporarily halt the CPU while the write buffer subsystem executes the conflicting write request and all those preceeding it in the buffer. A write buffer subsystem typically also generates a buffer full signal to prevent the CPU from issuing a write request when the subsystem cannot accept it.
In many computers, instructions or data may be fetched or stored in units smaller than a full word. Thus, if a computer system is designed around a 32-bit word, the CPU (or another device sharing access to memory) may be able to issue fetch or write commands for individual 16-bit halfwords or even 8-bit bytes. Computers having this flexibility may be improved by a different method. U.S. Pat. No. 4,347,562 describes apparatus for buffering 16-bit data units arriving from a peripheral device for writing to a 32-bit wide memory. The apparatus comprises means for holding the first 16-bit data unit and destination address received from the peripheral device. Before writing the data into memory, the apparatus waits for the arrival of a second address-data pair and determines whether the two addresses are in a single memory word. If they are, the apparatus writes the first and second data units into the memory at the same time. If the two addresses are not in the same memory word, then the apparatus writes the first data unit to memory and holds the second address-data pair for possible combination with the third address-data pair yet to be received from the peripheral device.
U.S. Pat. No. 3,449,724 describes another buffering scheme, this one for buffering both reads and writes to an interleaved memory system. The scheme described therein, among other things, recognizes when two buffered memory access requests are directed to the same memory location, and chains them together for execution with a single memory select operation. This scheme should reduce the time needed to access a busy interleaved memory module.