1. Field of the Invention
The present invention relates generally to the transfer of data between a memory unit and peripheral components in a computer system. More particularly, this invention relates to a system for expediting transactions between main memory and a component external to a central processing unit ("CPU"). Still more particularly, the present invention relates to a system for optimizing the transfer of data between a peripheral master controller and main memory.
2. Description of the Relevant Art
Data generally is transferred between memory and other components in a computer system in two steps. First, the accessing component generates signals on the address bus representing the address of the desired memory location. At the next, or at subsequent, clock cycles, the component actually transfers data on the data bus to or from the addressed memory location. For most computer systems, the number of clock cycles required for a data access to memory depends upon the component accessing the memory and the speed of the memory unit.
The speed of memory circuits is based upon two timing parameters. The first parameter is memory access time, which is the minimum time required by the memory circuit to set up a memory address and produce or capture data on or from the data bus. The second parameter is the memory cycle time, which is the minimum time required between two consecutive accesses to the memory circuit. For dynamic random access memory ("DRAM") circuits, which typically are used to construct the main working memory of the computer system, the cycle time typically is approximately twice the access time. DRAM circuits generally have an access time in the approximate range of 60-100 nanoseconds, with cycle times of 120-200 nanoseconds. The extra time required for consecutive memory accesses in a DRAM circuit is necessary because the internal memory circuits require additional time to recharge (or "precharge") to accurately produce data signals. Thus, a microprocessor running at 10 Mhz cannot execute two memory accesses, in immediate succession (or in adjacent clock pulses), to the same 100 nanosecond chip, despite the fact that a clock pulse in such a microprocessor is generated every 100 nanoseconds. DRAM chips require time to stabilize before the next address in that chip can be accessed. Consequently, in such a situation the microprocessor must execute one or more loop cycles before it can gain access to data in the DRAM circuit. Typically, a memory controller unit ("MCU") is provided as part of the computer system to regulate accesses to the DRAM main memory.
In addition to the delays caused by access and cycle times, DRAM circuits also require periodic refresh cycles to protect the integrity of the stored data. These cycles consume approximately 5 to 10% of the time available for memory accesses, and typically require 256 refresh cycles every 4 milliseconds. If the DRAM circuit is not refreshed periodically, the data stored in the DRAM circuit will be lost.
Because of these limitations, memory constructed with DRAM circuits is not always capable of responding to memory accesses within the time interval allotted by the central processing unit ("CPU") or peripheral master controller. In this event, external circuitry must signal to the CPU (or peripheral master controller) that supplementary processor cycles, or wait states, are necessary before the data is ready on the data bus, or before data from the data bus has been stored by the memory circuits. In addition to slowing the processing of the CPU, wait states generally require use of the CPU local bus, thereby limiting access to the bus by other system circuitry.
As the operating speed of processors increases and as new generations of processors evolve, it is advantageous to minimize wait states to fully exploit the capabilities of these new processors. Obtaining the maximum benefits of these new generations of high speed processors in personal computers, however, is especially difficult, because of size and power limitations of other components in the system, such as the DRAM main memory. With memory intensive applications, such as those involving technical or scientific calculations or computer aided design programs, the memory access time can greatly delay the system operation.
FIG. 1 is a block diagram of a prior art computer system 10 that comprises a microprocessor or central processing unit ("CPU") 12, a CPU local bus 14 coupled to the CPU 12, and a memory controller 16 and a local bus peripheral device 18 both coupled to the CPU local bus 14. A system memory 17 also is shown coupled to the memory controller 16 through a memory bus 15. In addition, a PCI standard bus 20 couples to the CPU local bus 14 through a PCI bus bridge 22. A PCI peripheral device 28 is shown coupled to the PCI bus 20. The PCI peripheral device 28 may comprise a PCI Master controller that is capable of asserting ownership of the PCI bus during PCI Master cycles.
The microprocessor 12 shown in FIG. 1 may comprise a model 80486 microprocessor, and the CPU local bus 14 could comprise an 80486-style local bus. The CPU local bus 14 includes a set of data lines D[31:0], a set of address lines A[31:0], and a set of control lines (not specifically shown). Details regarding the various bus cycles and protocols of the 80486 CPU local bus 14 are not discussed in detail herein, as they are well known by those in the art, and are available in numerous publications. CPU 12, memory controller 16 and PCI bus bridge 22 have traditionally been fabricated on separate integrated circuit chips. A recent trend in computer systems has developed, however, in which the CPU core is combined with a variety of peripheral devices on a single integrated processor chip. An exemplary integrated processor chip includes a bus bridge that provides a high performance interface between an internal CPU local bus and an external PCI bus. By providing a high performance interface to an external PCI bus, relatively high performance characteristics can be achieved with respect to external data transfers.
The PCI bus bridge 22 provides a standard interface between the CPU local bus 14 and the PCI bus 20. As such, the PCI bus bridge 32 orchestrates the transfer of data, address, and control signals between the two buses. PCI bus 20 typically comprises a high performance peripheral bus that includes multiplexed data/address lines, and which supports burst-mode data transfers.
The burst mode feature allows reads or writes to consecutive memory locations at high speed, via burst cycles on the PCI bus. The normal procedure for reading or writing from memory is that the CPU in a first clock cycle generates the address signals on the address bus, and then in the following clock cycle, data is transferred to or from system memory 17. Since the data bus is 32-bits wide, a total of four 8-bit bytes of data can be read or written by the CPU for every two clock cycles. Each set of four 8-bit bytes transferred on the data bus is referred to as a "double word." In burst mode, additional sequential double words may be transferred during subsequent clock cycles without intervening address phases. For example, a total of four double words can be read into the CPU using only five clock cycles because only the starting address is sent out on the address bus, and subsequently the first double word of data is read during the second cycle, the next double word of data during the third cycle, and so on. Burst mode operation thereby accommodates relatively high data transfer rates.
As noted, the PCI peripheral device 28 may comprise a PCI Master controller. In accordance with conventional techniques, the PCI Master may request "ownership" of the PCI bus, so that it can control transactions on the PCI bus 20. As one skilled in the art will understand, a plurality of PCI Masters may be included in the computer system, any of which may request ownership of the PCI bus 20. The PCI Master submits its request for ownership of the PCI bus 20 to the PCI bridge 22 on a control line in the PCI bus 20. The PCI bus bridge 22 typically arbitrates ownership requests among the various PCI Masters, and among the internal masters such as the CPU 12, and other internal masters. Typically, a priority ranking is assigned to each of the various Masters to assist the bus bridge 22 in its priority determination.
The PCI bridge 22 may operate either as a PCI master or as a local bus master. When the CPU 12 accesses PCI "slaves" external to the integrated processor, the PCI bridge 22 operates as a PCI master. Typically, during these PCI master cycles of the PCI bridge 22, either the CPU 12 or another local bus master (such as a DMA Controller) owns the CPU local bus 14 and the PCI bridge 22 owns the PCI bus 20. Conversely, for PCI external master accesses to devices residing on CPU local bus 14, the PCI bridge 22 functions as a slave or target with respect to the external master and functions as a master of CPU local bus 14.
Consequently, when a PCI master (i.e., such as peripheral device 28) gains ownership of the PCI bus 20 and initiates a cycle that corresponds to a device residing on the CPU local bus, such as memory controller 16, the PCI bridge 22 obtains ownership of the local bus 14. During this period, the CPU 12 and the other internal masters cannot use the local bus 14. This can cause considerable delay in system operation because accessing data in the main memory 17 takes at least several clock cycles as discussed above, thus requiring other system resources such as CPU 12 to wait while the data is being accessed from/to the external master. Compounding this problem is the fact that PCI masters may transfer data at very slow rates, and may transfer a plurality of double words through the execution of a burst data transfer. As a result, the bandwidths of the CPU local bus and the system memory bus are limited when PCI masters access system memory 17 in a slow manner.