1. Field of the Invention
The disclosure relates generally to DMA (Directly Memory Access) management, and, more particularly to PRD (Physical Region Descriptor) pre-fetch methods for DMA units.
2. Description of the Related Art
In computer architecture, DMA allows specific hardware to independently access the system memory without the use of a CPU (Central Processing Unit). DMA transactions copy memory regions between devices. At the same time, the CPU can be scheduled for other tasks, improving system performance.
PRD entries are stored in a PRD table in the system memory. A PRD entry defines information such as starting address and size of a specific memory block in the memory. Before the DMA operates, the DMA unit usually reads a PRD entry from the PRD table, thus to obtain the starting address and the size of a memory block to be accessed according to the PRD entry. Then, the DMA unit performs access operations to the memory block corresponding to the PRD entry, that is, to write data to the memory block or read data from the memory block. FIG. 1 is a schematic diagram illustrating a conventional DMA unit. As shown in FIG. 1, the DMA unit 300 comprises an interface A 310, an interface B 320, and a cache memory 350. The interface A 310 and the interface B 320 can be used to access the buses A and B, respectively. The cache memory 350 can be used to store the PRD entries pre-fetched by the interface A via the bus A. The DAM unit 300 further comprises a queue A 331 such as FIFO (First In First Out) queue, and a queue B 332. For DMA out transaction, the interface A 310 reads data from the memory 340 via the bus A, and stores the data to the queue A 331. The interface B 320 reads data from the queue A 331, and writes the data to the bus B, thus to transfer the data to a corresponding device, such as SATA or USB device. For DMA in transactions, the interface B 320 reads data from the bus B, and writes the data to the queue B 332. The interface A 310 reads data from the queue B 332, and writes the data to the bus A, thus to write the data to the memory 340. In some cases, few masters are on the bus B, but more masters are on the bus A. For providing best data throughput, the interface B 320 is not expected to be idle, that is data transactions is expected to be performed continuously. In other words, the queue A 331 is not expected to be empty for DMA out transactions, and the queue B 332 is not expected to be full for DMA in transactions.
Usually, the DMA unit 300 uses a “Scatter-Gather” mechanism to reduce the copy steps for data. The “Scatter-Gather” mechanism allows the DMA unit 300 to transfer data to several memory blocks defined by corresponding PRD entries in one data transaction. In other words, the DMA unit 300 can first collect several DMA requests, and then perform DMA transactions correspondingly. In the “Scatter-Gather” mechanism, a PRD pre-fetch mechanism can improve the performance and throughput for the DMA unit. FIG. 2 is a flowchart of a PRD pre-fetch method for a conventional DMA unit. First, in step S210, the interface A 310 reads a PRD entry from a PRD table, and stores the PRD entry to the cache memory 350 of the DMA unit. In step S220, it is determined whether the PRD table is at the end thereof, that is, to determine whether the current PRD entry is the last entry in the PRD table. If not, in step S230, it is determined whether the cache memory 350 is full. If the cache memory 350 is not full (No in step S230), the procedure returns to step S210, continuing to read another PRD entry from the PRD table, and store the PRD entry to the cache memory 350. If the current PRD entry is the last entry in the PRD table (Yes in step S220) or the cache memory 350 is full (Yes in step S230), in step S240, a PRD entry is read from the cache memory 350, and in step S250, a data transaction is performed according to the PRD entry. In step S260, it is determined whether the cache memory 350 is empty. If the cache memory 350 is not empty (No in step S260), the procedure returns to step S240 to read another PRD entry from the cache memory 350, and in step S250, perform a data transaction accordingly. If the cache memory 350 is empty (Yes in step S260), in step S270, it is determined whether the PRD table is at the end thereof. If not, the procedure returns to step S210 to read a PRD entry from the PRD table. If so, the procedure is complete.
For the above described conventional mechanism, when the DMA unit 300 is triggered to start, the DMA unit 300 will pre-fetch PRD entries from the PRD table and store the pre-fetched PRD entries to the cache memory 350 of the DMA unit 300 until the cache memory 350 is full. If the cache memory 350 is full, the DMA unit 300 performs DMA transactions according to the PRD entries within the cache memory 350 until the last PRD entry in the cache memory 350 is finished. Then, the DMA unit 300 will loop back to pre-fetch PRD entries until the last PRD entry in the PRD table is pre-fetched.
For a DMA out transaction in FIG. 2, it is likely that the queue A 331 will go empty when the DMA unit 300 pre-fetches the PRD entries. For a DMA in transaction, it is likely that the queue B 332 will go full when the DMA unit 300 pre-fetches the PRD entries. Specifically, in the DMA out transaction, the DMA transaction corresponding to the last PRD entry is finished on the bus A, while the next DMA transaction will not begin until the fetching of the PRD entries is finished. If the fetching of the PRD entries takes too much time and the interface B continues to read data from the queue A 331, the queue A 331 may be underflow. In the DMA in transaction, the DMA transaction corresponding to the last PRD entry is finished on the bus A, while the next DMA transaction will not begin until the fetching of the PRD entries is finished. If the fetching of the PRD entries takes too much time and the interface B 320 continues to write data to the queue B 332, the queue B 332 may be overflow. The above two cases cause the interface B 320 to go into an idle state to prevent the queue A 331 in underflow or queue B 332 in overflow, eventually leading the performance of data throughput of the DMA unit 300 to descend.