The present invention relates generally to a direct memory access controller (DMAC) and more particularly to an intelligent DMAC.
Direct Memory Access (DMA) is a method for direct movement of data between two components, for example in a computer system. Specifically, the data is moved between the components via a bus without program intervention. A DMA Controller (DMAC) is typically a memory-mapped peripheral device that performs memory-to-memory, memory-to-peripheral, peripheral-to-memory, and peripheral-to-peripheral data transfers. The specialized hardware of the DMAC maximizes utilization of the system bus so that transfers are performed quickly and efficiently. In this manner, DMA operations typically outperform data movement operations performed by a CPU. Additionally, DMA operations free up the CPU to do other operations.
FIG. 1 illustrates a typical prior art DMAC 100 instantiated into a conventional computing system including a CPU 110, a memory 120, and a peripheral device 130. CPU 110 executes the software instructions of the computing system, whereas memory 120 stores data and instructions for the computing system. Peripheral device 130 generally expresses output signals of or provides input signals to the computing system. Examples of peripheral device 130 include graphics cards, keyboard interfaces, and disk I/Os. The computing system further includes at least one bus 140 which facilitates communication between the various elements. For example, CPU 110 utilizes bus 140 to communicate data to or from peripheral device 130. Most prior art DMACs rely on bus 140 to conduct the DMA operation.
DMAC 100 typically includes a set of registers which hold information necessary to the DMA operation. For example, DMAC 100 includes a source register (Src) 101 for storing the contents of the source address of the DMA bus cycles, a destination register (Dest) 102 for storing the contents of the destination address of the DMA bus cycles, and a length register (Len) 103 for storing the number of pieces of data to transfer. In this embodiment, DMAC 100 also includes a next register (Next) 104 for storing the address of the next place in memory where the DMACs parameters are stored (explained in detail below). Note that herein the term xe2x80x9cregistersxe2x80x9d may include counters, registers, or a combination therein.
A single channel DMAC contains one set of registers 101-104. Many prior art DMACs support multiple channels which are represented in FIG. 1 as the dashed line boxes under DMAC 100. In a typical multiple channel DMAC, registers 101-104 are simply instantiated once per channel. Thus, a four channel DMAC would include four sets of registers 101-104.
FIG. 2A is a typical example of the hierarchy of software and hardware in a conventional computing system. At the bottom of the hierarchy is hardware 200, typically the actual hardware in the computing system. A register interface 250 facilitates communication between hardware 200 and the software of the system.
Continuing up the hierarchy, driver software 210 is considered the software that communicates with, i.e. reads and writes to, hardware 200. Typically, driver software 210 is highly specialized software that is specific to the actual hardware of the computing system. For example, the driver software in an Apple Macintosh model 9500/120 computer cannot generally be used in place of the driver software in a Toshiba model Tecra 730 computer. However, hardware and software manufactures have gone to great lengths to standardize register interface 250 so that they can reuse driver software 210 in a variety of different computing systems. Examples of driver software 210 include hard disk drivers, floppy disk drivers, serial port drivers, parallel port drivers, graphics port drivers, and mouse drivers.
An Application Programming Interface (API) 240 is provided between driver software 210 and operating system software 220 as well as between operating software 220 and application software 230. API 240 is a means of communicating between various layers of software. Specifically, API 240 refers to a standardized means of passing data between two different pieces of software. Operating system software (also referenced herein as OS software) 220 is the layer of software which generally handles the tasks of the computing system. These tasks would include items such as opening a file for input, prioritizing interrupts to the system, and scheduling events for later processing. Examples of current operating systems include: Apple Computer, Inc. MacOS Version 8.1; Sun, Inc. SunOS Version 5.5.1, and Microsoft, Inc. Windows ""95.
OS software 220 communicates with driver software 210 and with application software 230 using different APIs 240. Each API 240 is different because of different data communication needs. For example, OS software 220 generally communicates only data to driver software 210, with a small overhead of control information. In contrast, OS software 220 often communicates task information to and/or from application software 230, even though a high percentage of that task information is data.
Application software 230 is generally the highest level of software in a computing system. Typically, the user of the computing system communicates with application software 230 using a graphical user interface. Illustrative application software, such as Claris, Inc. ClarisWorks 4.0 (a word processing program), allows the user to open a text file, read the file, view the file, make changes to the file, save the file, and print the file. Based upon the specific requests of the user, application software 230 makes calls to OS software 220 via API 240B to accomplish one or more of the above tasks. However, typically application software 230 is still responsible for the actual processing of the data. In some computing systems, OS software 220 is used to help draw graphics and text on the screen. In this manner, application software 230 is not burdened with extra software that could be standardized for other applications. Note that although FIG. 2A shows a single application software 230, in general, a set of application software 230 actually communicates via a set of APIs 240B to OS software 220.
FIG. 2B illustrates API 240A between OS software 220 and driver software 210 in more detail. In API 240A, four basic functions have been defined as a means of communication between the two software layers. These four functions are: Open, Close, Read, and Write. The Open function is used by OS software 220 to initialize driver software 210 for its first usage. Similarly, the Close function is used by OS software 220 to halt the operation of driver software 210. These functions are typically used by OS software 220 to dynamically start and stop software drivers so that the computing system resources can be shared by various higher level software. In addition to the Open and Close functions, this API 240A includes Read and Write functions which are generally used to either write data to or read data from hardware 200 via driver software 210.
FIG. 2B illustrates a number of parameter blocks (PBs), wherein PBs are generally designated locations in memory for specific parameters. For example, in a DMA transfer, the source address, destination address, and length of the transfer need to be designated. These values are placed in memory locations (i.e. PBs). These memory locations are predefined so that the memory can efficiently communicate the data to software (note that the software has had knowledge imparted to it that describes which locations of the PB contain which important data).
A DMAC process typically takes place in three stages: initialization, data transfer, and termination. Referring back to FIG. 1, during the initialization stage, CPU 110 sets up the DMA process by loading source register 101 with a starting source data address, destination register 102 with a starting destination data address, and length register 103 with a length count. After such loading, CPU 110 directs DMAC 100 to start the data transfer operation.
At this point, DMAC 100 initiates data transfers from the data source to the data destination. For example, if data is to be moved from memory 120 to peripheral device 130, then DMAC 100 controls the data transfer between those two components. As data is transferred, source and destination address and length count registers 101-103 are updated. When the length count is decremented to zero, DMAC 100 enters the termination stage. During termination, DMAC 100 updates its status register (not shown) and, in some designs, generates an interrupt request (also not shown) to CPU 110.
In some prior art systems, DMAC 100 supports data transfers of non-contiguous blocks of memory. These transfers of DMAC 100 need to allow for the continuous transferring of data without the assistance of CPU 110. These so-called xe2x80x9cchainedxe2x80x9d operations are accomplished by adding more control logic (not shown) and next register 104 in DMAC 100. Additionally, CPU 110 must have set up a plurality of contiguous parameters in memory readable by DMAC 100. These contiguous parameters are often referred to as a request block, and typically include a source start address, a destination start address, a length count for this transfer, and a pointer to the memory location of the next request block.
Each set of contiguous data transfers requires its own request block. For example, assume the system must transfer two blocks of 1000 pieces of data, but each block of data resides in different memory locations. In this case, CPU 110 simply builds two request blocks and loads the parameters from the first request block into registers 101-104 of DMAC 100. Then, DMAC 100 transfers the DMA data, as indicated by the first request block. Once length count register 103 is zero, DMAC 100 uses a next register 104 to reload registers 101-104 from the parameters contained in the second request block (e.g., next register 104 is a pointer to a location in memory 120 where an address of the next request block is stored). DMAC 100 then transfers the data block referenced by the second request block until length count register 103 is zero again. DMAC 100 in turn looks at next register 104, wherein a special flag value in next register 104 (e.g., a specialized xe2x80x9cstopxe2x80x9d token, such as a xe2x80x9c0xe2x80x9d value) indicates to DMAC 100 that all DMA data has been transferred. Note that both blocks of DMA data have been transferred without CPU 110 intervention. Once both blocks had been transferred, DMAC 110 enters the termination stage which updates internal registers and otherwise completes the DMA process.
Although many types of DMACs exist, none compensate for their register interface to software or bus overhead. Thus, prior art DMACs typically have high software and hardware latency requirements. xe2x80x9cLatencyxe2x80x9d is the time required before a given operation is actually begun after the command to begin the operation has been given. For example, the latency from Operating System software 220 initiating a DMA process to the actual starting of the process includes the time delays associated with Operating System software 220, driver software 210, and hardware 200. This time period is inherently long because the procedure consumes additional CPU bus cycles, memory bandwidth, and CPU calculation time. Latency periods are wasteful of time, and can cause significant negative impact to system functionality. For example, in video games, latency problems can show up as xe2x80x9cmushyxe2x80x9d controls. Therefore, a need arises for a method and apparatus to significantly reduce the latency due to the combinations of software and hardware operations in the DMA process.
To describe the Intelligent DMA Controller (IDMAC) of the present invention, the following hierarchical terminology is used. A xe2x80x9cDMA bus cyclexe2x80x9d is an individual transfer of data between two points (either a fly-by which has only one bus cycle, or a non-fly-by which has two bus cycles). A xe2x80x9cDMA transferxe2x80x9d is one or more bus cycles required to transfer data between a source and a destination. A xe2x80x9cDMA transactionxe2x80x9d is a continuous set of DMA transfers. Finally, a xe2x80x9cDMA processxe2x80x9d, which describes a whole process from start to finish, includes one or more DMA transactions.
The IDMAC of the present invention addresses the problems of prior art DMACs by using two types of intelligence. First, the IDMAC uses control-wise intelligence to minimize the time spent transferring data between the various control and data processes of the system, thereby reducing DMA process (software and hardware) latency as well as CPU calculation time. Second, the IDMAC uses data-wise intelligence to effect manipulation of data on-the-fly according to dynamically read opcodes during the DMA process.
To get these two types of intelligence, the IDMAC replaces one or more layers of software with intelligent hardware. Additionally, the IDMAC is given specific knowledge of the structure of certain pieces of memory or hardware registers (e.g. PBs) used for Inter Process Communication. This specific knowledge can be imparted during the design phase of the IDMAC, or dynamically provided during its operation as system requirements dictate.
The IDMAC achieves its controlwise intelligence by understanding PBs. A PB can be as simple as a collection of IDMAC parameters in contiguous memory, or as complicated as multiple levels of memory indirection or indexing. The IDMAC gets all of its PB parameters directly from memory by utilizing its knowledge of the PB to obtain the parameters, dereferencing as required, and then begins transferring data between the source and destination as controlled by the PB(s). Examples of PB parameters are source address, destination address, transfer length, and data intelligence opcode. Note that the IDMAC allows for bidirectional nesting of PBs, thereby allowing for complete error recovery.
Because the IDMAC can also interpret complex PB structures, it can remove many layers of software compared to prior art DMACs. For example, the data movement operations associated with a word processing application that writes to a hard disk drive can be almost totally contained inside the hardware of the IDMAC. In this example, the PBs may have many levels of indirection caused by the application software to driver software flow. Because the IDMAC can directly interpret this set of complex structures, the software overhead can be decreased substantially. Additionally, because of the elimination of CPU cycles to effect the same procedure, the IDMAC of the present invention reduces the latency of the DMA transaction and increases performance.
The IDMAC also can add data-wise intelligence to the DMA process and therefore is capable of performing various types of manipulations to the data on-the-fly, In other words, the data flowing through the IDMAC is modified in real time, and does not consume additional DMA bus cycles. The IDMAC utilizes a Data Intelligence Unit, along with additional specific knowledge of memory or registers to achieve its data operations. These additional structures allow the Data Intelligence Unit to obtain parameters from the data stream before and/or during the DMA transactions. It then utilizes these parameters to alter the data of the DMA bus cycle. This altering can be done either on-the-fly, or by consuming additional bus cycles, depending upon system requirements.