1. Field of the Invention
This invention relates to computer system input output (I/O) operations, and more particularly to an apparatus and method for improving I/O performance of a multiple I/O channel computer system at minimal cost. This invention is intended for use in computer systems controlled by a multiple tasking, multiple threaded operating system such as Windows NT, OS/2, System 7.5, Novell or UNIX, but could also provide significant cost advantages in computer systems controlled by single threaded and/or single tasking operating systems such as Windows or DOS.
2. Description of Related Art
A typical computer system generally includes one or more central processing units (CPUs), main memory and one or more I/O channels. The functions of the I/O channels are to transfer data between the CPU's main memory and input or output peripheral devices such as storage units or network interface devices. Storage units store data and programs which the CPU uses in performing specific programming tasks. Typical storage units include hard disk drives, CDROMs and tape drives. Network interface devices allow the transfer of data to or from other computer systems on the network. Typical network interfaces include Ethernet, fiber channel, ATM and FDDI.
In a simple, low performance architecture, the CPU exchanges data between the main memory and the I/O channels by programmed I/O instructions. Each unit of information (byte, word, etc.) is retrieved from the main memory by the CPU and transferred to the I/O channel by the CPU (or vice versa to transfer data in the opposite direction). As CPU performance rapidly increased, the power of the CPU chip dramatically outpaced the capacity of even the fastest I/O channels and peripheral devices to consume or produce data. Utilizing a high performance CPU for lower speed I/O tasks was found to be wasteful of the valuable CPU processing power. A high performance CPU spent much of its time waiting for an I/O device or processing interrupts to service low speed I/O channels. To reduce this I/O servicing time load on a powerful CPU, Direct Memory Access (DMA) techniques evolved in which the I/O channel includes sufficient processing power, independent of the CPU, to directly manipulate data in the main memory of the computing system, devoid of CPU intervention for lengthy periods of time. DMA techniques free the CPU power for computational functions rather than simply exchanging data with significantly slower I/O channels and peripheral devices.
As the CPU processing power has increased, application programs have evolved which require significantly higher I/O bandwidth between peripheral devices and the main memory. For example, the generation of real-time, full motion video may require retrieval of large volumes of data from a mass storage device over an extended period of time. During this extended period of time the I/O channel electronics associated with the mass storage transfer is dedicated to a single transfer function and therefore unusable for initiating other I/O operations. Typically, full motion video windows are limited in size to a fraction of the video monitor size due, in part, to limitations in I/O peripheral device bandwidth. In a single-user, single-threaded operating system environment (such as MSDOS) it may be acceptable for an I/O channel to be dedicated to a single I/O request for an extended period of time. A single-threaded, single-user system may not require the initiation of other I/O operations during this extended period of time. However, users operating computer systems in multi-window, multi-tasking, graphics intensive environments expect essentially instantaneous response from the computer system as they enter commands or "click" icons. Users perceive that mouse clicks on an icon representing a large text file or application should result in immediate visual results, whether or not the selected file resides on a peripheral device currently busy servicing other I/O requests and regardless of the busy status of an I/O channel (for example in use for an extended period of time transferring large blocks of full motion video data).
This "instant response" expectation, when coupled with the high-bandwidth transfer requirement, dictates that I/O controllers allow I/O requests to be preempted and rescheduled at a later time. Toward this end, many techniques commonly referred to as "scatter/gather" methods are applied to partition lengthy I/O requests into smaller segments which the I/O channel DMA circuits may manipulate. Among other benefits, the scatter/gather approaches known in the art permit the I/O channel to be more easily preempted at the boundaries between the smaller segments of large I/O requests. The I/O channel is typically provided with a data structure (referred to herein as Physical Region Descriptor Tables or simply PRD tables) in main memory which identifies a list of segments to be exchanged between the I/O peripheral device and main memory. The data management component (DMA) of the I/O channel reads information from the PRD table entries to determine the size and locations of segments to be exchanged between the peripheral device and main memory. When the end of the table entries is encountered, the I/O channel signals completion of the I/O request to the CPU and is ready for initiation of another request.
Historically, I/O devices, due to the nature of their technology and their mechanical or network delays, are able to sustain data transfers at only 1/16 to 1/2 of the I/O bus bandwidth. Most I/O devices contain buffer memories (typically 32 Kb to 256 Kb in size) to compensate for their low media or network bandwidths. The buffer is used to adapt the interface speed of the slower peripheral I/O device to the faster speed of the main memory DMA interface. The buffer is filled from main memory at a high speed then transmitted to the peripheral I/O device at a slower speed. Conversely, when reading information, the buffer is filled at a slower speed when reading information from a peripheral device, then emptied at a faster speed when transmitting the read information to the main memory (via DMA). Thus, once the slow media rate has partially filled the devices buffer memory, I/O bus transfers may progress at data rates dictated by the faster bus bandwidth rather than rates dictated by the slower media or network performance. However, once the buffer space has been exhausted due to a transfer which is larger than the buffer size, transfer bandwidth reverts back to the media or network bandwidth.
One approach to improving the overall performance of I/O bandwidth is typified by Enhanced Integrated Drive Electronic (EIDE) disk drives having larger buffers than typical IDE interfaces. With a larger buffer, the EIDE channel is capable of exchanging information at over sixteen megabytes per second. Brad Hosler has proposed an IDE interface specification for improving system performance in multi-tasking environments, called "A Programming Interface for Bus Master IDE Controllers", which assumes the capabilities inherent in the DMA capable IDE disk drives mentioned earlier. Hosler's proposal, which is hereby incorporated by reference as background material, specifies a simple scatter/gather mechanism which allows large or small blocks of I/O transfer data to be scattered to or gathered from main memory. This mechanism cuts down on the number of processor interrupts required and the total number of I/O device interactions with the CPU when transferring large blocks of data, such as that required to support full motion video. Although the scatter/gather programming interface specifications were originally intended specifically for controlling hard disk drives on an IDE channel, the proposed software standards may be readily adapted to access other types of storage devices on the IDE channel, storage devices on other types of I/O channels or network interface devices using the same homogeneous scatter/gather programming interface.
FIGS. 1 and 2, discussed in detail below, depict the architecture and methods typical in prior I/O channel designs. Where a single data management component (DMA circuits of an I/O channel) is associated with the control of the peripheral I/O device. FIGS. 3A and 3B, discussed in detail below, graphically depict the utilization of the I/O channel circuits. FIGS. 3A and 3B graphically show the use of the buffer associated with each I/O device. The buffers associated with "channel 0 device" and with "channel 1 device" are each filled by the device at a slower pace than they are emptied by "channel 0 bus operation" and "channel 1 bus operation", respectively. In FIG. 3B, a single data management component (DMA) is utilized to exchange data with two I/O peripheral devices. As can be seen in FIG. 3B the total time for exchanging data with both devices is essentially equal to twice the time required for a single device. This is due to the fact that the single DMA circuit must be applied to the two I/O devices sequentially. That is, data relating to the second I/O peripheral device remains pending in the buffer associated with the device until the DMA (data management component) has completed operations involving the first device.
Obviously this limitation on I/O channel performance may be resolved by the addition of I/O channel circuits which may be associated with different peripheral I/O devices in parallel with the operation of the first I/O channel. As shown in FIG. 3A, the addition of a second I/O channel data management component (DMA) permits the second operation to overlap with the first. However, such a simplistic solution adds complexity and hence cost by duplicating portions of the I/O channel circuitry which are underutilized. It can also be seen in FIG. 3A, that there remains a significant portion of "idle" time for the DMA circuits while the buffer associated with each I/O device is refilled. Therefore, addition of I/O channel DMA components provides a simplistic solution to improve the overall throughput but is wasteful of the costly, complex electronic circuits.
The larger buffers used in the EIDE standards, even in conjunction with the scatter/gather standards proposed by Mr. Hosler, still do not fully utilize the maximum bandwidth capabilities of the I/O channel's DMA control circuits. Over the extended period of time transferring very large files, even a large buffer will be filled and emptied several times thus leaving the I/O channel circuits less than fully utilized.
From the above discussion, it is evident that a need exists for an improved method and apparatus to increase I/O bandwidth while minimizing the additional costs and complexity associated with the increase.