1. Field of the Invention
The present invention relates generally to the queuing, control and transfer of data between a host processor and a peripheral processor, and more particularly, to first-in-first-out (FIFO) systems and methods between a host processor and a peripheral processor.
2. Related Art
While input/output historically has been the orphan of computer architecture, it is absolutely critical to the performance of any traditional class of computers: mainframe, minicomputer, workstation, file server, and personal computer. As stated by John L. Hennessy and David A. Patterson in their seminal book Computer Architecture A Quantitative Approach, Morgan Kaufmann Publishers, Inc., Palo Alto, Calif., U.S.A., 1990, p. 499 (which is incorporated herein by reference):
A computer without I/O devices is like a car without wheels--you can't get very far without them. And while CPU time is interesting, response time--the time between when the user types a command and when she gets results--is surely a better measure of performance. The customer who pays for a computer cares about response time, even if the CPU designer doesn't. Finally, as rapid improvements in CPU performance compress traditional classes of computers, it is I/O that serves to distinguish them.
"Smart" peripheral devices have internal processing functionality. Such smart peripheral devices often require streams of instructions and data.
Referring now to FIG. 1, the architecture of conventional smart peripheral devices is shown. The streams of instructions and/or data may either be held in one or more specific data structure(s) 101 in a main memory 102, or in a separate (deep) hardware memory, which in conventional systems is usually a FIFO located in an I/O controller (such as 106, 112, or 134) associated with (or part of) the I/O or peripheral device.
In the example shown in FIG. 1, there are three so-called peripherals that are shown for purposes of illustration. The first peripheral, which is designated generally by a reference numeral 130, is made up of an input/output (I/O) controller 106, a disk drive 104, and a disk drive 108. I/O controller 106 may or may not contain internal processing functionality in the form of a peripheral processor (CPU) or peripheral controller (not shown).
The second peripheral is designated generally by a reference numeral 132. Second peripheral 132 is made up of an I/O controller 112 and a graphics output 114, which typically is a cathode ray tube (CAT) display or a frame buffer, but can be any suitable graphics output device. Like I/O controller 106, I/O controller 112 may or may not have internal processing functionality.
The third peripheral is designated generally by a reference numeral 134. Third peripheral 134 is made up of an I/O controller 116 and a network 118. Like I/O controller 106, I/O controller 116 may or may not have internal processing functionality.
First peripheral 130, second peripheral 132, and third peripheral 134 can either receive data and instructions from digital processor 126, receive instructions from and send read data to the digital processor 126, or receive instructions from and send to and receive data from the digital processor 126.
In the conventional system and method of FIG. 1, the data and instructions are received from main memory 102 via a CPU-memory bus 122 and a bus adapter 120 to an I/O bus 110.
The architecture of FIG. 1 can also be applied to systems with multiple memory devices 101 and/or I/O buses 110.
FIG. 2 shows another conventional architecture and method, where the peripherals are connected directly to cache 124. Specifically, I/O controllers 106, 112 and 116 are connected via the I/O bus 110 and bus adapter 120 and a bus 204 to cache 124, which is connected to CPU-memory bus 122, and to CPU 126 via a translation look aside buffer (TLB) 202. Like the conventional system of FIG. 1, the streams of instructions and data for the peripherals in the system of FIG. 2 may be placed in a deep hardware buffer such as a FIFO as part of the I/O controller, or as a software data structure stored in main memory 102.
A conventional system and method for transferring data between a digital processor and a peripheral processor utilizing virtual direct memory access (DMA) is shown in block architectural form in FIG. 3. The virtual DMA requires a register for each page to be transferred in a DMA controller 302, showing the protection bits and the physical page corresponding to each virtual page. Address-translation registers 304 connect DMA 302 to CPU-memory bus 122.
Note that the architecture in FIG. 3 (and for that matter FIGS. 1 and 2) can also be implemented without an I/O bus 110. The peripheral could be directly connected to DMA 302, and in turn, directly connected to CPU-memory bus 122. The architectures shown in FIGS. 1, 2, and 3 are illustrative only, and are not intended to represent a detailed discussion of computer architectures in general.
Regardless of whether the instructions and/or whether the streams of instructions and data are held in the software FIFO data structure 101 of main memory 102, or are placed in a hardware FIFO buffer, there is a trade-off as to the depth of the FIFO of these conventional systems. The software specific data structure approach is slow since the streams of instructions and data must be sent over CPU-memory bus 122 via bus adapter 120 to I/O bus 110.
The hardware FIFO buffer approach is faster. However, a deep hardware FIFO is expensive, especially in a computer system using a high clock rate. As is well-known, higher clock rates are constantly being used for the host processor 126 to improve system performance and increase system functionality.
Regardless of which of the conventional systems of FIGS. 1, 2 and 3 is used for the instruction and data path between the digital processor (CPU) 126 and the peripheral processors found in the I/O controllers 106, 112, and 116, some type of FIFO system must be used, either the specific data structure(s) 101 stored in main memory 102 or a deep hardware FIFO buffer.
To reduce latency, it is advantageous that as much of the instructions and data that will be needed by the peripheral processor be present in the FIFO. Thus, a conventional strategy for reducing latency is to create a larger (also called a "deeper") FIFO.
However, the use of a deeper FIFO approach means that more instructions and/or data must be dealt with before the stream of instructions and/or data can be switched to another stream of instructions and/or data. Such a switch in the streams of instructions and data is called a "context switch." In other words, all of the instructions and/or data in the FIFO must be dealt with by the I/O controller 106, 112, 116 before the stream of instructions and/or data being provided to the I/O controller 106, 112 or 116 is context switched. This has the effect of increasing latency because of the extra time that is taken before the context switch can take place between streams of instructions and/or data.
One conventional approach for reducing latency is to make the FIFO shallow. This means that less instructions and/or data must be dealt with before the stream of instructions and/or data can be context switched. However, the use of a shallow FIFO means that the FIFO must be serviced more often. This increased servicing consumes additional bandwidth and CPU overhead (for checking the state of the FIFO), which is not desirable.
Thus, it would be advantageous to provide a system and method whereby the streams of instructions and/or data between a digital processor and a peripheral processor can be context switched quickly, is inexpensive to implement, minimizes impact on system latency, and reduces memory bandwidth requirements.