In many computing systems, large blocks of data are transferred from a data source, such as a graphics card or a game cartridge, to a data memory on a fairly regular basis. This transfer of data is preferably carried out as quickly as possible to optimize the speed and efficiency of the system. Traditionally, two mechanisms have been employed to effect data transfer. The first mechanism involves the use of a software routine executed by a central processing unit (CPU) to transfer the data from the source to the memory. According to this mechanism, the CPU executes a data transfer loop which moves the data a word at a time from the data source, through the CPU, to the memory until all of the data is transferred. A drawback of this method is that it requires the CPU to generate a source and a destination address for each data word transferred, and to test and branch after each transfer. These operations require several extra clock cycles per data word. For a large number of data words, the generation of addresses and the testing and branching impose a considerable burden on the system, which in turn, slows the system down significantly. The use of a software routine is especially inefficient when data needs to be stored in the memory in a non-sequential fashion. In such a case, the CPU is precluded from using its sequential addressing mode to generate the addresses, which means that the CPU must instead calculate the addresses. Address calculation requires even more extra cycles, which in turn, slows down the data transfer process even more. Thus, even though a software routine properly effects the transfer of data, it does so in an undesirably slow manner.
As an alternative, the direct memory access (DMA) mechanism has been used. In the DMA scheme, when the data source has a large block of data to transfer, the DMA hardware signals the CPU, and in response to this signal, the CPU relinquishes control of the system bus to the DMA hardware. Thereafter, the DMA hardware transfers the block of data directly from the data source to the memory without passing through the CPU. While DMA is faster than the software loop, DMA does have its disadvantages. One major disadvantage is that, in order to implement DMA, special hardware must be provided in each computing system in which a specific data source may be installed, and this hardware adds cost to the system. In the case where the computing system has already been sold, adding DMA hardware to the system may be practically infeasible. A second disadvantage of the DMA scheme is that it requires the CPU to relinquish control of the system bus while data transfer is taking place. This means that during data transfer, the CPU cannot service interrupts or perform any substantive processing. In effect, the CPU is rendered practically useless during that time. If data transfer takes up a significant percentage of the system run time, the CPU will be rendered ineffective for an inordinate amount of time. For many systems, such an implementation is impracticable. For the reasons discussed above, neither prior art data transfer mechanism provides satisfactory results.