A typical digital data processing system comprises a number of basic units including a central processing unit (CPU), a memory unit, and an input/output (I/O) unit. The memory unit stores information in addressable storage locations. This information includes both data and instructions for processing the data. The information is transferred between the memory unit and the CPU along a bus consisting of control lines, address lines and data lines. Control request signals specify the direction of transfer. The CPU issues a read request signal to transfer information on the bus from the memory unit, interprets the information as either instructions or data, and processes the data in accordance with the instructions. The CPU then issues a write request signal to store the results in addressed locations in the memory unit.
The information transferred between the CPU and memory unit must conform with certain timing relationships that exist between the request signals and the information on the bus. Access time is defined as the time differential between the time the memory unit receives a request signal from the CPU and the time when the information is available at the memory unit. A time interval of a memory request cycle is a function of the internal clock frequency of the CPU and the access time of the memory unit. If the CPU's logic operates at a very fast clock frequency or cycle and the access time of the memory unit is slow as compared to the clock, it may take longer than a clock cycle for the CPU to access the memory unit. This is especially true for highly pipelined CPUs, such as those that are used in many reduced instruction set computers (RISCs).
A goal of RISC computer designs is to achieve an execution rate of one instruction per clock cycle. In other words, although it may take several clock cycles between the time an instruction enters the CPU pipeline to the time its execution is completed, the design goal is for the CPU to thereafter remain in a run state and process an instruction on every clock cycle. Accordingly, a new instruction must be fetched and/or a new memory access request is issued on every clock cycle. Since the memory unit is not fast enough to execute the new memory access request immediately, the CPU must enter a wait state until the request is completed, thereby affecting the processing rate of the CPU. This problem becomes particularly critical when the memory access request is a read request, since the CPU is unable to operate, that is, process data in accordance with instructions without the requested information.
A high speed cache memory is used in these situations to compensate for the time differential between the memory access time and CPU clocking logic. The cache memory's access time is closer to the operational speed of the CPU logic and thus increases the speed of data processing by providing information to the CPU at a rapid rate. The cache memory operates in accordance with the property of "locality of reference", whereby references to memory locations at any given time tend to be confined within a localized area in memory. When the CPU requires information, the cache memory is first examined. If the information is not found in the cache, the memory unit is accessed. A block mode read request is then issued by the CPU to transfer a block of information including the required information from the memory unit to the cache memory.
The I/O unit also communicates with the memory unit in order to transfer information into the data processing system and to obtain information from it. The I/O unit normally operates in accordance with control information supplied to it by the CPU. The control information defines the operation to be performed by the I/O unit. Typical devices comprising the I/O unit include printers and video display terminals, and may also include secondary storage devices such as disks or tapes.
In such a data processing system, there may be a significant degree of contention on the system bus for the memory unit as among the CPU and I/O units. As a result of contention for the memory resource, a system bus controller is provided to resolve the contention in accordance with an arbitration method. Accordingly, the CPU may be unable to retrieve information from, or store information in, the memory as fast as it may otherwise, again causing the CPU to enter a wait state which adversely affects its performance.
U.S. Pat. No. 4,805,098 describes a prior write buffer subsystem for accepting address-data pairs from a CPU and placing them in a first rank of an internal buffer having a plurality of ranks. It then issues a request to a bus controller, informing it that a data set is available for writing into main memory. When the bus is free, the controller enables the data set onto the bus and causes the write to take place. When the write is completed, the controller acknowledges its use of the information and awaits another request from the write buffer subsystem.
Here, if the write buffer subsystem receives two write commands from the CPU in sequence, both of which reference the same memory word address, the subsystem will gather these commands into a single buffer rank so that they may both be executed in a single access to main memory. In other words, gathering occurs only when the word address of an incoming write request matches the word address of the immediately preceeding write request in the buffer. Non-sequential write requests, that is, write requests stored in the internal write buffer other than the request immediately preceeding the incoming write request, are not gathered.
The prior write buffer subsystem also provides a signal useful for detecting the issuance of a memory read command to an address for which a write command is pending, i.e. a read conflict. When this signal is asserted, the CPU enters a wait state and the bus controller executes the pending write commands, i.e. flushes the buffer, in the order stored in the buffer ranks until the signal is cleared. The memory read command is then executed. Since multiple clock cycles are required to retire each write command in the buffer to main memory, and additional clock cycles are needed to execute the read request, the CPU remains in a wait state for a long period of time.
Therefore, in accordance with an aspect of the present invention, a feature is to provide a read-write buffer unit which significantly minimizes the time necessary to identify and resolve read conflicts, thereby increasing the CPU performance rate.
Additionally, a feature of the present invention is to provide a read-write buffer unit which allows the identification and resolution of read conflicts during a block mode read request.
In accordance with another aspect of the invention, a feature is to provide a read-write buffer unit which allows the gathering of non-sequential write requests in an internal write buffer, thereby reducing the number of memory accesses and improving system throughput.