1. Field of the Invention
The present invention relates to the high speed transfer of information within a data processing system and in particular, it relates to Direct Memory Access (DMA) transfers. More specifically the present invention relates to optimizing DMA transfers between a memory system and a First-In, First-Out (FIFO) Random Access Memory (RAM) buffer.
2. Description of the Related Art
Program controlled data transfers within a data processing system such as a computer require a significant amount of the central processor's time to transfer a relatively small amount of data per unit time, i.e., a low data rate. In addition, the central processor cannot execute any other processing functions during program controlled Input/Output (I/O) operations. Although interrupts increase the attainable data rate, require less software, and allow concurrent processing, applications exist where the required data rate is simply too high to be achieved by using interrupts or where the data rate is such that the time spent in interrupt service routines impacts the concurrent processing to an unacceptable degree.
However, Direct Memory Access (DMA) facilitates maximum I/O data rates and maximum concurrency. Unlike the programmed I/O and interrupt I/O transfer methods that route data through the processor, a system that supports a DMA transfer method directly transfers data between memory and an I/O device. To implement a DMA transfer method, additional logic external to the central processor, called a DMA Controller (DMAC), is required.
DMACs, typically embodied as specialized dedicated I/O processors, include counters to provide the memory and port addresses and to count the number of words transferred. Before a transfer can occur, the central processor must initialize the DMAC to specify the direction and type of transfer, the source and destination addresses, and the number of bytes or words to be transferred. Once this initialization is completed, the central processor releases the system buses, the DMAC takes control of the buses, and the DMAC performs the entire transfer.
Unlike a data transfer performed by the central processor, no instructions need to be fetched during the transfer to tell the DMAC how to perform the transfer. Thus, all memory cycles are available for transferring data, and the transfer can be performed at the maximum speed possible, i.e. the memory access speed. The peripheral device or I/O system that is either the destination or source of the transferred data generally operates at a slower rate than this maximum. Thus, the DMAC can allow the central processor to run for a few cycles in between transfers while the DMAC waits for the device or I/O system to be ready to transfer the next byte.
After the DMAC has been initialized by the microprocessor, the peripheral (such as a LAN interface or disk controller) can initiate the transfer at any time by asserting the DMA REQUEST input to the DMAC. The DMAC then asserts BUS REQUEST to the central processor (this signal is called HOLD in some systems). The central processor completes the instruction it is currently executing, disables its address, data, and control bus outputs, and asserts the BUS ACKNOWLEDGE signal. The DMAC then takes control of the buses to perform the transfer. The DMAC controls the buses in the same manner as the microprocessor.
Upon taking control of the buses, the DMAC is said to establish a channel between memory and the target or source device. A transmit channel allows the DMAC to transfer of data out of memory while a receive channel allows the transfer of data into memory. DMACs can be designed to support multiple pairs of transmit and receive channels so as to support multiple devices. In addition, channels can be bidirectional. Prior art DMACs are further disclosed in M. Slater, Microprocessor-Based Design, Mayfair Publishing Co., (1987) and K. L. Short, Microprocessors and Programmed Logic, Prentice-Hall, Inc. (2.sup.nd ed. 1990), which are both incorporated herein by reference.
Typically, a transmit channel requires the use of a RAM buffer to temporarily store a unit of data, called a frame, as it is transferred from memory to an interface having a bus width different than memory. Likewise, receive channels also employ a buffer when transferring data from an interface to memory. This buffer is usually required due to bus latency, characteristic of any multi-user bus.
Most digital communication protocols run on LAN or WAN adapters require data to be arranged in data frames, or data packets, having a characteristic maximum size. Although data frames are usually defined as having a maximum length, hardware systems that support different protocols must be able to handle frames of any length. However, to optimize the use of memory, most computer operating systems typically allocate blocks of memory for temporary data storage in sizes smaller than the size of typical data frames. Thus, a single data frame will usually be comprised of data contained in several different blocks of memory. The blocks of memory each contain a data buffer and can be scattered non-sequentially throughout memory. Memory typically has a width that is equal to an even number of bytes and the bytes define addressable boundaries across the width of the memory.
Usually, the system will maintain a list of the data buffers that comprise a data frame. The data buffers themselves are defined by descriptors. Descriptors are small tables stored in memory, each associated with a particular data buffer, that define the size, location and status of the data buffers. As indicated above, a single data frame may be comprised of several data buffers scattered through out memory, it may be of any number of bytes in length, and it may start on any address boundary. In addition, data buffers may be of any number of bytes in length and may start and end on any byte boundary within memory.
FIG. 9 depicts a typical descriptor's 930A relationship to its associated data buffer 902 and the relative position a data buffer 902 may occupy within a block of memory 900. The total shaded region of FIG. 9 represents the bytes that comprise a data frame. Each shaded region within the memory blocks 900, 910, 920 represents a data buffer 902, 912, 922. As depicted, the data frame is comprised of three data buffers 902, 912, 922 and the data buffers 902, 912, 922 do not begin on memory width boundaries.
The list of descriptors 930 together identify the data frames to the system. Each descriptor 930A, 930B, 930C contains information about its corresponding data buffer 902, 912, 922. Referring to both FIGS. 9 and 10, the descriptors 1000, 1010, 1020 hold the address pointer 1002 and length 1004 of the data buffer 902, along with a status and command field 1006. This information can be read from memory and stored in registers within a DMA controller as each data buffer 902 is processed. Because there may be more than one data buffer 902 per data frame as described above, the system will typically maintain an End of Frame status bit within the status and command field 1026 of the descriptor 1020 to indicate that a particular data buffer is the last data buffer in a given frame.
Different computer systems will typically have memory subsystems with different widths. Four bytes wide is currently a popular dimension for a memory but it is anticipated that future systems will be much wider. The width of a memory system defines natural boundaries in the addressing system used to access the memory. A data bus coupled to a memory that is four bytes wide will have thirty-two signal paths, each able to carry one bit of data. The system can access a memory location and, in one cycle, read or write all of the bytes within the memory width boundaries of the accessed location.
Thus, if a four byte wide system is required to provide four bytes of data starting at a memory location aligned on a memory width boundary, the processor will be able to read all four bytes in just one cycle. If however, the desired four bytes of data does not begin at a byte location that is also a memory width boundary, the system will require two cycles to access the data.
This concept can be more easily visualized with reference to FIG. 9. A four byte wide memory system is depicted indicated by memory blocks 900, 910, and 920. Each block 900, 910, 920 represents a portion of memory space within the four byte wide memory system. The columns drawn with dashed lines on the blocks 900, 910, 920 represent the byte boundaries within the memory system. Each column is labeled with a byte number zero through three from right to left. The solid horizontal lines, shown in block 900, in conjunction with the vertical dashed lines define the individual byte storage locations of the memory system.
The four bytes of any one row of a memory block 900 can be read simultaneously in one cycle. Thus, if four consecutive bytes must be accessed and the first byte is located in the column labeled byte zero, the system will only require one cycle to read or write this byte. If, on the other hand, the four bytes that must be accessed start with byte one, two, or three, the system will have to first access the bytes on the row of the starting byte and then, in a second memory access cycle, access the bytes on the next row. If the data had been aligned within memory as above, this four byte access would have taken only half the time.
Thus, when data is aligned with regard to memory width boundaries, it can be accessed faster. What is needed is a method of storing data in memory, transmit buffers, and receive buffers, that quickly and efficiently aligns the data along memory width boundaries as the data is transferred and thus, allows optimized accesses of the data. What is further needed is a system that ascertains the data buffer's characteristics via the descriptors in terms of variables, that do not change state during the data buffer transfer process. This would allow for faster circuitry, thus facilitating the silicon synthesis process.
Prior art DMA systems necessitate the use of wait states during the transfer while the frame is properly assembled for transfer into in the destination memory. Other prior art systems require the use of complex feedback circuitry to realign the bytes during transfers. What is needed is a simple, efficient system that does not require wait states or complex feedback circuitry to align data as it is transferred.