1. Field of the Invention
The present invention relates generally to direct memory access (DMA) read operations, and, more particularly, to DMA read transfers from input/output (I/O) devices attached to a multiprocessor system with shared resources.
2. Background Information
Direct memory access (DMA) between a main processor memory and an I/O device, where the transfers of the data are made exclusive of the processor, has been routinely accomplished in computing systems for many years. So, only a brief introductory discussion is presented herein.
The use of a DMA channel is rooted in making for more efficient data transfers, often where speed is a factor. Using a programmed or memory mapped input/output transfer requires executing program instructions for each data transfer, thereby reducing the speed (sometimes referred to as bandwidth) of the data transfers. This may result in the I/O device waiting. In contrast, DMA controllers take direct control of logic signals on the memory bus itself, and thereby it can effect data transfers by logic circuitry directly operating the read/write/status/etc. lines of the memory itself. This well known operation often takes the form of the DMA controller monitoring the xe2x80x9cbusyxe2x80x9d line of the memory, and, when not busy, the DMA device asserts exclusive control of the memory bus and performs the reads or writes as previously determined.
Although speed is often the main factor is using DMA, other system constraints and/or requirements may convince the designer to interface an I/O device via a DMA controller. For example, having the data transfers, even if slow, occur completely in the background with respect to operating programs may warrant the use of a DMA controlled device.
In general, for any I/O device, including DMA controlled devices, xe2x80x9ccontrolxe2x80x9d information must be transferred between an I/O controller/device and the processor. For example, some of the types of information that may be directed to a DMA controller might be the memory address(es) to which the data transfers are directed, a count of the number of bytes to be transferred, a signal enabling the start of the data transfers, and an indicator of which interrupt line is to be used by the controller to signal when the transfers are complete. In addition to control information, there will be the actual data transfers between the processor and the controller/device. Examples of data might include the text that appears on a monitor or an application program being uploaded into the processor. xe2x80x9cControlxe2x80x9d and xe2x80x9cdataxe2x80x9d are the terms used herein to distinguish these types of information.
The control information may be passed in several ways. One such way is by programmed I/O, another approach is by programmable or firmware logic in the DMA controller, a third way is to have DMA transfers of control information in addition to DMA transfers of data. Combinations of the above may also be used as known by practitioners in the art.
The discussion above is not meant to be inclusive of all the functions and implementations used with DMA controllers. The above is only to note that such control and operations of DMA controllers and devices interfaced thereto are well known, and that practitioners in the art know how to design hardware, software and/or firmware to implement such DMA controllers.
DMA controllers become much more complex when interfaced to multiprocessor systems. For example, the DMA controllers may be configured to accommodate a great number of I/O devices operating under a wide variety of I/O protocols. For example an accelerated graphics port (AGP), a peripheral computer interconnect (PCI), and a peripheral computer interconnect extension (PCI-x) are well known protocols that are used to interface many I/O devices. Some of these devices might exhibit high data rates, such as, fiber optic communication channels. Each processor in the multiprocessor systems may have locally controlled hardware, memory, and I/O, but the multiple processors also share hardware and software resources. The multiprocessor system with the interconnected shared resources are herein referred to as the xe2x80x9cmesh.xe2x80x9d
The DMA controller is designed to satisfy the mesh on one side and the I/O devices on the other. Designers are often concerned that the shared memory being used by the DMA data transfer may reside at the far end of the mesh from the I/O device, that the mesh may be busy, and that there may be a number of DMA devices with large amounts of data needing attention. These factors will affect xe2x80x9clatencyxe2x80x9d which is the time it takes the mesh to respond with data after a request for the data is received by the mesh. The memory in these multiprocessor systems is designed primarily to accommodate the processors in the system. This might mean sixty-four or longer word bit widths, and transfers might be made with eight hundred or higher megaHertz (MHz) clocks. But the I/O device usually will have different clock speeds and different word lengths. DMA controllers are arranged to xe2x80x9cbridgexe2x80x9d these two environments and fulfill the requirements of the mesh and the I/O device. For these and other such reasons, it is common to find buffer memory in DMA controllers.
In some known designs the buffer memory or cache is used to buffer pre-fetched data when a DMA read request is received. Pre-fetch means that the data is received by the device controller before actually being requested. But, the design of the cache system for the pre-fetched data includes some tradeoffs and limitations. For example the cache may buffer a given amount of data that was pre-fetched in response to a read request from a device, but, if the device controller can not accept the amount, the unused cache data would have been pre-fetched unnecessarily. If the data remains in the cache waiting for a retry by the device, the cache is not useful to any other device. In either case the cache is used inefficiently. On the other hand, if the cache is reduced, and the device required more data than the cache holds, then the device would drain the cache empty and would have to assert another request for the rest of the data. The result is that the device would be slowed down.
Another limitation with a fixed cache is that all devices on the same I/O DMA bus would share the same cache and any pre-fetching will not likely match the needs of all the supported I/O devices.
It is an object of the present invention to address the above tradeoff to provide an efficient balance between the size of cache resources in DMA controllers and the speed requirements of I/O devices.
The above limitations are overcome by the present invention that provides an adaptive allocation of cache in a DMA controller. When initialized, the DMA controller allots a selected amount of cache for a device attached to the controller via an I/O bus. The amount may be initially determined as sufficient by the system designer when the particular type of I/O device was contemplated. However, in accordance with the present invention, the allocation of the cache is modified dynamically as a function of past usage by the I/O device.
When an I/O device (via the controller) requests or uses an amount of cache different from the amount previously allotted, the DMA controller stores the difference in amount of data pre-fetched against the amount of data actually used. When subsequent requests are made by the I/O device, the allotted cache is increased or decreased as a function of an algorithm with logic preferably in hardware in the DMA controller so that eventually the amount of cache allotted and the requirements of the I/O device substantially match. In a preferred embodiment, and as discussed below, the cache is organized by blocks or xe2x80x9clinesxe2x80x9d of sixty-four bytes each. The algorithm uses the number caches lines requested by past requests and the utilization of the pre-fetched cache lines to determine if and by how much the cache allotted to this device should change.
An advantage of the present invention is that, if there are a number of different I/O devices sharing the same I/O data bus and DMA controller, the adaptive nature of a DMA controller made in accordance with the present invention allows each such I/O device to have a dynamically different allotment of cache lines. This allows an allotment of cache lines to each specific device that most nearly satisfies the needs of each device.
In a preferred embodiment, the algorithm uses the most recent sixteen I/O requests for multiple cache lines. The number of requests for more than a given number of bytes and the number requesting less than another number of bytes are compared. The number of cache lines allotted to the I/O device is a function of the difference between the two numbers. When more cache lines are to be allotted, the number of cache lines is doubled, and, when less cache lines are to be allotted, one cache line is removed.