The present invention relates generally to the field of computer systems, and real time processing systems. The present invention is more particularly related to methods and apparatuses for managing multiple direct memory access channels.
Since the advent of the computer system, efforts have been focused on increasing its speed and capabilities. One thrust in the technology has been aimed in the direction of peripheral devices. Peripheral devices are capable of handling specific functions that were once commonly performed by the central processing unit (CPU), the heart of the computer system. Today, peripheral cards and devices handle many types of specific tasks, allowing the CPU to handle the management of the computer system. For example, peripheral cards exist for audio processing, video processing, digital signal processing, modem interface, network interface, 3-D graphics processing and many more.
Indirectly, the proliferation of peripheral devices used in a computer system has increased the CPU's management burden. Peripheral systems typically need access to memory located on the computer system, or main memory. Conventionally, the CPU has been burdened with the task of managing transfer of data during a peripheral devices' accesses to the main memory.
Direct memory access systems have helped to relieve the burden on the CPU. FIG. 1 is a block diagram of a computer system 2 utilizing a direct memory access system. Computer system 2 includes a CPU 10, a main memory 12, a bus controller 13, a direct memory access (DMA) controller 15, a bus 20 and peripheral devices 30, 32 and 33. In the illustrated computer system, the peripherals are an audio processing card 30, a video processing card 32 and a digital signal processing card 33.
In the computer system, DMA controller 15 manages accesses to main memory 12 by peripherals 30, 32 and 33. Initially, a software application is normally implemented on CPU 10, which controls the operation of a particular peripheral device, commonly referred to as a driver. Typically, the driver would initially set up a peripheral device, for example audio processing card 30.
Another application implemented on CPU 10 might then request a sound to be played. The driver may then retrieve a piece of audio data from an external memory source, such as a disk drive, and place it in main memory 12. The driver, through CPU 10, would typically inform DMA controller 15 of the size of the data transfer. DMA controller 15 normally establishes a single direct memory access connection between main memory 12 and audio processing card 30 via bus 20 via bus controller 13.
Throughout the transfer of the audio data between main memory 12 and audio processing card 30, DMA controller 15 manages the DMA channel while CPU 10 manages the transfer to data from the memory source to main memory 12, as will be discussed further below. The constant interaction between CPU 10, DMA controller 15 and main memory 12 adds to the duties of the CPU. Essentially, the peripheral devices are slaves and CPU 10 and DMA controller 15 are the masters, which means that the CPU and the DMA controller have the responsibilities of establishing and maintaining the data transfer through the DMA channels between main memory 12 and the peripheral devices 30, 32, and 33.
Additionally, computer systems are typically capable of only establishing a limited number of DMA channels between main memory 12 and peripherals 30, 32 and 33. Typically, only one DMA channel is established per peripheral device. The limited number of DMA channels limits the number of peripheral devices that may be connected to bus 20, and the amount of information that may be transferred from main memory 12 to the peripheral devices.
In more recent systems, computer systems have incorporated a peripheral component interconnect (PCI) bus and an associated controller to increase the bandwidth between main memory and peripheral devices, referring to FIG. 2. FIG. 2 depicts a prior art computer system 3 utilizing a peripheral component interconnect bus 21. Computer system 3 typically includes CPU 10, main memory 12, a PCI bus controller 16, PCI bus 21, and PCI compatible peripheral devices 40, 42 and 43.
An advantage of PCI bus 21 is that it is capable of handling more data than older prior art buses, as well as allowing peripherals to act as bus masters. For example, current PCI buses are normally capable of handling 132-266 megabytes per second, and as high as 572 megabytes per second. Older prior art buses have typically been limited to about 33 megabytes per second (e.g., ISA buses are limited about 8.3 mbps and EISA buses are limited to 33 mbps).
Another difference between PCI bus 21 and older buses is the capability of establishing a greater number of DMA channels between main memory 12 and peripheral devices, partly because peripherals can act as bus masters. However, even the PCI computer system 3 typically only establishes one DMA channel per peripheral device between main memory 12 and a particular peripheral device 40, 42 and 43. Thus, while the bandwidth of a DMA channel established between main memory 12 and a peripheral device may be increased, up to the bandwidth of PCI bus 21, the DMA channel is still limited to the bandwidth of the particular bus 21.
Therefore, the introduction of a PCI bus into prior art computer systems, has only somewhat alleviated the bandwidth limitations of older prior art computer systems. Additionally, the PCI bus has only partially solved the management problems associated with DMA data transfers. In computer system 3, the peripherals handle the transfer of data after a DMA channel has been initiated. Therefore, some of the responsibilities of establishing and maintaining a DMA channel have been relegated to peripheral devices 40, 42 and 43.
In a typical operation, a driver is implemented on CPU 10, for example an audio processing peripheral driver. A request by another application for the playback of audio data normally triggers the driver, vis a vis CPU 10, to move the requested audio data from an external source to main memory 12. The driver informs audio peripheral 40 that the requested audio data is ready for retrieval. Audio processing peripheral 40 acts as the master rather than the PCI bus controller 16 and CPU 10 directing the actual transfer. Audio processing peripheral 40 typically sends a request to PCI bus controller 16 for access to PCI bus 21. The other peripherals 42 and 43 may also be requesting control of the PCI bus, and PCI bus controller arbitrates the requests in order to allow the orderly transfer of information from main memory 12 to the various peripherals 40, 42 and 43.
In that respect, PCI bus controller 16 and CPU 10 are relieved of the duty of keeping track of and establishing DMA channels with the various peripherals. However, even in PCI based computer systems CPU 10 may be overly burdened with the management of the actual transfer of the data from main memory 12 to peripherals 40, 42 and 43.
Typical methods of managing data in main memory and the transfer of data to a peripheral vary, referring to FIGS. 3-5. FIG. 3 depicts a memory map of main memory 12 of either FIG. 1 or FIG. 2. A driver implemented on CPU 10 may retrieve a requested block of data 50 from another memory medium and store it in main memory 12. As illustrated, data 50 may be segmented into multiple data segments 50(0)-50(n). Data segments 50(0)-50(n) also may be segmented in a non-sequential order.
Assuming the above situation of transferring data to audio processing peripheral 40, in order to pass along data segments 50(0)-50(n) to audio processing peripheral 40, one method has been to utilize ping and pong buffers 55 and 56, respectively. Generally, data segments 50(0)-50(n) are copied from their respective locations in main memory 12 to ping and pong buffers 55 and 56 by CPU 10, for transfer to peripheral 40, as discussed further in reference to FIGS. 4-5.
FIGS. 4-5 are diagrammatic flow charts describing a DMA transfer utilizing ping and pong buffers 55 and 56. FIG. 4 describes the function of CPU 10, as directed by the audio driver, during a DMA transfer. FIG. 5 describes the function of audio processing peripheral 40 during the DMA transfer.
Initially, in FIG. 4, the audio driver is initiated in block 71. The driver loads a requested block of data 50 into main memory 12 in block 73. In block 76, the driver provides the addresses and the sizes of ping and pong buffers 55 and 56 to peripheral 40. Once audio processing peripheral 40 is made aware of the locations of ping and pong buffers 55 and 56, the driver loads ping and pong buffers 55 and 56 with the first segments of the block of data 50. Depending on the sizes of the data segments 50(0)-50(n), and the sizes of ping and pong buffers 55 and 56, an entire data segment may be loaded into either ping or pong buffers 55 or 56, or only pieces of a data segment. However, initially, the first data segment 50(0) is loaded in ping and pong buffers 55 and 56, sequentially, with ping buffer 55 containing the first piece of data segment 50(0).
The driver then initiates the operation of peripheral 40 to begin the download of data 50 from main memory 12 in block 78. Referring now to FIG. 5, audio processing peripheral 40 is started in block 91. In block 92, peripheral 40 may send out a DMA request to PCI bus controller 16. The DMA request typically asks PCI bus controller 16 to hand over control of PCI bus 21 to peripheral 40 until it has finished retrieving data, or until another peripheral 42 or 43 or CPU 10 requires the use of PCI bus 21. Peripheral 40 waits in block 92 until it is given control over PCI bus 21. Once control is obtained, peripheral 40 reads the contents of ping buffer 55 in block 93. Peripheral 40 may or may not have a chance to read the entire ping buffer 55 before control over PCI bus is returned to PCI bus controller 16. Thus, in block 94, peripheral 40 determines if the entire ping buffer 55 was read. If not, peripheral requests another DMA access, returning to block 92.
If peripheral 40 was successful in reading the entire contents of ping buffer 55, in block 95, peripheral 40 generates an interrupt to CPU 10 to interrupt the operations of the audio driver. Referring back to FIG. 4, after the audio driver has initiated peripheral 40 in block 78, the audio driver had gone into a wait state in block 79, waiting for an interrupt from peripheral 40. Once an interrupt is received in block 79 the driver determines, in block 82 if the interrupt was generated by peripheral 40 because all of data 50 has been transferred. If the end of data 50 has not be transferred to peripheral 40, CPU 10 loads either the ping or the pong buffer with the next data segment 50(0)-50(n) or any portion thereof from main memory 12, in block 84, depending on which buffer 55 or 56 had just been read out to peripheral 40. For example, if ping buffer 55 had just been read out, it would be loaded with the next piece of data, and similarly if pong buffer 56 had just been read out. The driver then returns to block 79 and waits for the next interruption.
Referring back to FIG. 5, at the same time CPU 10 updates ping buffer 59, peripheral 40 typically obtains control over PCI bus 21 in order to read pong buffer 56 in block 96. In block 97, once control is obtained, pong buffer 56 is read by peripheral 40. Again, if the entire contents of pong buffer 56 had not been read out, block 96 directs peripheral 40 to attempt to again gain control over PCI bus 21. Once all the contents of pong buffer 56 have been transferred to peripheral 40, CPU 10 is again interrupted in block 99.
Referring back to FIG. 4, once an interrupt (generated by either blocks 95 or 99 of FIG. 5) is detected in block 79, CPU 10 proceeds to block 82, under the direction of the audio driver, and determines if the end of data 50 has been transferred. If not, the appropriate ping or pong buffer is loaded with the next data segment 50(0)-50(n). If the end of data 50 has been transferred, process flow proceeds to block 86 and stops peripheral operations 78, and then ends the operations of the audio driver in block 87. The above is a typical establishment and conduct of operations of a DMA channel using ping and pong buffers 55 and 56 in main memory.
As can be appreciated, the use of ping and pong buffers 55 and 56 requires extensive operations by CPU 10, under the direction of the driver, in maintaining the DMA transfer. Each DMA channel established by peripherals 40, 42 and 43 normally requires that CPU 10 continually update a set of ping and pong buffers 55 and 56. While audio processing peripheral 40 may be able to retrieve data faster than in non-PCI prior art computer systems primarily, due to the increased bandwidth of PCI bus 21, the burden on CPU 10 may not be significantly alleviated. This method is especially burdensome since CPU 10 must read and write every single byte of data from data segments 50(0)-50(n) to ping and pong buffers 55 and 56.
Another method of transferring data, referring to FIGS. 6-8, involves the use of a scatter-gather table. FIG. 6 depicts an alternate memory map of main memory 12 of FIG. 2. As with the ping and pong buffer memory map of FIG. 3, requested data 50 is segmented into data segments 50(0)-50(n) in main memory 12. However, instead of ping and pong buffers, a scatter-gather table 60 is mapped within main memory 12. Typically, scatter-gather table contains some of the addresses and sizes of data segments 50(0)-50(n), but normally not all the addresses and sizes of all the data segments.
Again using the example of an audio driver, scatter-gather table 60 is used by the driver, via CPU 10, to keep track of all the data segments 50(0)-50(n), referring to FIGS. 7-8. FIGS. 7-8 are diagrammatic flow charts describing a DMA transfer utilizing scatter-gather table 60. FIG. 7 depicts a flow chart 100 of the operations of CPU 10, as directed by the audio driver, during a DMA transfer. FIG. 8 depicts a flow chart 108 of the operations of audio processing peripheral 40 during the DMA transfer.
Flowchart 100 begins in block 102 with the implementation of the audio driver on CPU 10. The driver will typically receive a signal from another application instructing the driver to play a sound. In block 104 the driver retrieves the requested block of data 50 from another memory source and places it into main memory 12. After data 50 has been segmented and placed in main memory as data segments 50(0)-50(n), in block 105 the driver gathers up the addresses and sizes of a number of the first data segments. The number of addresses and sizes may vary, but typically do not contain all the addresses and sizes of all data segments 50(0)-50(n) unless data 50 is small.
For purposes of illustration, in block 106, the audio driver places the addresses and sizes of data segments 50(0)-50(x), where "x" is less than "n", into scatter-gather table 60 in main memory. The driver then initiates audio processing peripheral 40 in block 108. Block 108 is further described in FIG. 8.
In reference to FIG. 8, audio processing peripheral 40 is started in step 120. Typically, in block 122, audio processing peripheral initiates a PCI DMA burst request in order to obtain control over PCI bus 21. After PCI bus controller 16 gives control over PCI bus 21 to audio processing peripheral 40, in block 123, audio processing peripheral 40 first obtains the address and size of the first data segment, or 50(0).
With the address and size of data segment 50(0), audio processing peripheral 40 then attempts to obtain control over PCI bus 21 again in block 125. After again obtaining control of PCI bus 21, in block 127 audio processing peripheral 40 begins to read the contents of data segment 50(0).
After control is turned back to PCI bus controller 16, peripheral 40 determines, in block 128, whether the entire data segment 50(0) was successfully read from main memory 12. If not, peripheral 40 returns to block 125 to finish reading data segment 50(0). Once all of data segment 50(0) has been read from main memory 12, in block 129, peripheral 40 checks to see if that is the end of data 50. Peripheral 40 proceeds to block 130 if data 50 has not been completely read from main memory 12.
In block 130, audio processing peripheral 40 checks how many data segments have been successfully read from main memory 12. Typically, peripheral 40 will read only a subset of data segments 50(0)-50(x) before requesting CPU 10 to update scatter-gather table 60. For purposes of illustration, audio processing peripheral 40 may check to see if half of the scatter-gather table entries have been read, i.e., 50(0)-50(x/2), rounding down. If less than x/2 data segments have been read, then peripheral 40 returns to block 122. In blocks 122-128, peripheral 40 retrieves the next data segment 50(1), and repeats until data segment 50(x/2) has been read.
Returning back to block 130, with x/2 data segments read, peripheral 40 proceeds to block 132. Peripheral 40 generates an interrupt to CPU 10 in block 132.
Referring back to FIG. 7, CPU 10 proceeded to block 109 after it initiated peripheral 40 in block 108. In block 109, CPU 10 waits for an interrupt from peripheral 40. Once an interrupt is received from peripheral 40 by CPU 10 in block 109, CPU 10 continues to block 110. In block 110 CPU 10 determines if the interrupt was a request to update scatter-gather table 60 or a signal that the end of data 50 has been reached. If the interrupt is a request to update scatter-gather table 60, CPU 10 proceeds to block 111 where CPU 10 updates the scatter-gather table.
CPU 10 typically updates the entries in scatter-gather table 60 that have been read by peripheral 40. When audio processing peripheral 40 reaches the end of scatter-gather table 60 the peripheral may loop back to the beginning of scatter-gather table 60 and read the newly edited entries. In the illustrated example, CPU 10 replaces the entries of data segments 50(0)-50(x/2) with the next grouping of entries for data segments 50(x+1)-50(x+1+(x/2)), or if (x+1+(x/2)) is greater than "n", 50(x+1)-50(n). Once scatter-gather table 60 has been updated, CPU 10 returns to block 109.
Returning back to FIG. 8, audio processing peripheral 40 proceeds from block 132 back to block 122 to retrieve further entries in scatter-gather table 60 and the corresponding data segments. Peripheral 40 will continue to retrieve data segments and interrupt CPU 10 until scatter-gather table 60 has been updated with the relevant information of the last data segment 50(n). After peripheral 40 has read the contents of data segment 50(n), peripheral 40 will proceed from block 129 to block 133 and generate a final interrupt.
Returning back to block 109 of FIG. 7, CPU receives the last interrupt and proceeds to block 110. CPU 10 then determines that the interrupt was generated due to the last segment condition. CPU proceeds to block 114 where the DMA channel is terminated.
While CPU 10 is not constantly shuffling data from one memory location to another location, as in the ping and pong method, the burden of keeping track of data segments 50(0)-50(n) is delegated to peripheral 40. Additionally, peripheral 40 is required to execute more PCI DMA burst requests to PCI bus controller 16. It may be appreciated by those skilled in the art that DMA requests are typically not immediately satisfied and spurious delays in obtaining control of PCI bus 21 may occur. Further, the simple fact of having to read scatter-gather table 60 and then proceed to obtain the corresponding data segment introduces further delays. The delays involved may sometime cause real-time operations to glitch. Another drawback is the additional memory and resources needed by peripheral 40 in order to maintain the information about data segments 50(0)-50(n).
Thus, typical prior art systems encounter problems that limit their abilities to handle more than one PCI DMA channel per peripheral. Namely, arbitration latency caused by requiring the peripheral to constantly consult a scatter-gather in order to carry out a single PCI DMA channel.
System interrupt delays also pose a significant limitation on the abilities of prior art systems. As explained, typical prior art systems require numerous interrupts to the CPU to continue operations. The CPU is required to respond to every interrupt and constantly update the operations of the peripheral. Interrupts to the CPU are often times delayed because of the CPU's many duties and may not be the most efficient method of managing PCI DMA channels.
While the two illustrated prior art methods of establishing and maintaining a DMA channel are only a few, the drawbacks of prior art methods are similar. That is, prior art methods unduly burden the CPU with the tasks of managing the DMA channels established, or places the burden on the peripheral, requiring additional memory and resources. Also, the problem of limited bandwidth between the main memory and a peripheral still exists. While the PCI bus and future buses may extend the bandwidth of a DMA channel, current and future real-time operations may require even greater bandwidths.
Perhaps an even greater burden on the CPU occurs before any of the management of a DMA request. In prior art systems the CPU must not only provide data to a peripheral device it must also perform operations on the data before it is handed over to the peripheral. Audio data, for example, in prior art systems is often times processed by the CPU before being sent to an audio peripheral device for output. The processing may entail mixing, amplification, frequency modulation or other types of processing. Thus, the CPU had to perform duties in addition to the operation of the primary application.
Thus, what is needed is a method of transferring data from main memory to peripheral devices at greater bandwidths. With increasing proliferation of complex audio and video incorporated into computer systems bandwidth between main memory and peripheral devices may become a critical bottleneck. The ability to increase bandwidth, while at the same time not overburdening the CPU, complex operations by peripherals may be performed without degrading the performance of the CPU. At the same time, methods for transferring data to peripheral devices in their raw form such that the processing of the data may be performed by the peripheral devices rather than the CPU is further desired. Thus, increasing bandwidth, decreasing management of data transfers and transferring the burden of data processing to a peripheral device are the goals to be achieved.