This invention relates to sub-systems in computers for transferring data from one memory location to another, or from or to a memory location to or from an input/output (I/O) device. In particular, the present invention relates to direct memory access (DMA) subsystems having data word widths of 2 or more bytes of data, each byte comprising 8 bits, in which the data may be manipulated and acted upon during the transfer.
In the past, computer systems have been designed to transfer data to the central processing unit (CPU) as quickly as possible so that the CPU could perform all of the operations necessary to achieve the desired results, including large scale iterative scientific computations, on-line business transactions processing, graphics interfaces, accounting programs, and a myriad of other data manipulation intensive operations. DMA became one of the leading methods for providing faster data transfers, particularly in bus-oriented computer system architectures. Even so, DMA subsystems take finite time periods, usually referred to as "bus cycles", or simply "cycles", to complete a data transfer operation. While cycle times have become shorter as the speed of semiconductor device technology becomes faster, the fact remains that no operations on the data could be performed until the transfers were complete.
In a related consideration, early computers were configured to operate on data word widths of 8 bits. As more complex functions were computerized, 8-bit bytes quickly limited the speed of computations and therefore, the throughput of data operations. Thus, wider word widths of 16, 32, 64 bits, and more, arranged into 8-bit wide bytes, were introduced.
Wider word widths are generally encountered in large scale, main frame systems, but may also be found in process control systems and the like which are driven by intermediate size computers also known as mini-computers. In addition to the general advancement of the level of competence of the individual user, much of the present day commercial and industrial business transactions are being implemented on personal computers (P/C) or P/C-based systems. Since it has become desirable to perform complex functions on P/C-based systems, such systems are being configured to operate on word widths wider than 8 bits to efficiently perform such complex operations. While the vast majority of installed P/C's are limited to 8-bit data word widths, microprocessors becoming commonly available for use in designing the next generation of P/C's have data widths of 16 and 32 bits. Thus, since peripheral devices and memory subsystems to which such microprocessors must interface can have data widths of 8, 16 or 32 bits, DMA subsystems must be compatible with such data widths to anticipate advances in software products and growth in the technology generally.
In the prior art, memory in an 8-bit data word system is directly addressed from the address bus. The memory is organized into 8-bit words, and each address signal decoded from the address bus points to a different and unique word (in this case, also equal to a byte) in the memory. In a system designed for 16-bit data words, the memory is organized into 16-bit words. As long as the DMA subsystem is only required to transfer data from a 16-bit memory or I/O device to or from a 16-bit I/O device or memory, each transfer comprises a word and the data is written or read to or from even numbered addresses. In the prior art, there are systems that provide both 8-bit and 16-bit memory-to-memory and memory-to-I/O device accesses.
If compatibility with 8-bit word I/O devices is desired, some provision for directing flow of data to and from the memory word locations is required. Therefore, typically, 16-bit memory words are further organized into two 8-bit bytes of data. Bits 0-7 and 8-15 are designated the low byte and high byte, respectively. Thus, in a 16-bit data word system, each word comprises two 8-bit bytes.
The method for directing the data to and from the individual bytes locations requires additional address signals called byte enables. In a 16-bit word system, the first address line which addresses the first low byte, i.e. the A0 address line, is replaced with two byte enable lines called "BE0N" and "BE1N". When the BE0N signal line is active, the lower byte of the data word is transferred; when the BE1N signal is active, the upper byte of the data word is transferred. When both BE0N and BE1N lines are active at the same time, both bytes, i.e. the complete word, is transferred at the same time.
Similarly, in a 32-bit word system having a 32-bit DMA subsystem, each word is organized into four, 8-bit bytes. In such a system, both address lines A0 and A1 are replaced with four byte enable lines, BE0N and BE1N, BE2N and BE3N, respectively. Likewise, in a 64-bit system with a 64-bit DMA subsystem, each data word is organized into eight, 8-bit bytes per word and address lines A0, A1 and A2, are replaced with eight byte enables BE0N through BE7N. In all cases, the byte enable lines are said to "point" to the bytes of the word that are to be transferred.
In prior art DMA subsystems, three bus cycles are required for a 16-bit data word transfer to and from an odd address in a P/C system compatible with 8-bit data word bytes. In such systems, the memory is organized into 16-bit words, (i.e. two 8-bit bytes) so that the word boundaries are on even addresses. Referring now to FIG. 1A, during cycle 1 for a 16-bit odd address location memory read cycle, the address points to an even memory location and an 8-bit read of the upper byte is performed in response to the byte one enable. The data read (byte A) is stored in a register associated with bits 7-0 comprising byte A. After the first memory read cycle, the memory address is incremented by two to the next even address, and the byte pointer is activated to point to the lower byte of data to be read (byte B). The lower byte of data is read and steered into another register during the second cycle. During the third cycle, the entire word, bits 15-8 and 7-0, i.e. bytes B and A, are assembled serially and driven onto the bus for transmission to the I/O device. Thus, it requires three cycles to transfer each word using this technique. If this technique is extrapolated for transferring words comprising more than two bytes, the number of cycles required would be (a+1)N, where a is the number of bytes per word and N is the number of words to be transferred. Thus, for a 4-byte word, 5 cycles/word would be required.
Referring to FIG. 1B, a 16-bit odd memory write operation is similar to the 16-bit odd memory read operation. Again, three bus cycles per transfer are required. In the first cycle, one word, i.e. two 8-bit bytes B and A, is read from the I/O device and stored in a latch. During cycle two, lower byte A in the latch is written to the upper byte memory location in response to byte enable BE1N. The memory address is incremented, BE1N is driven inactive and the upper byte in the latch is written to the lower byte memory location in response to byte enable signal BE0N.
It should be noted that, in the prior art, the memory address may be decremented in all 8-bit accesses and 16-bit accesses from an even address. However, for a 16-bit access from an odd address, the memory address may be incremented only.
As already noted, most computer systems, whether large scale main frame computers or modern day P/C's, transfer data to registers in or near the CPU or microprocessor, respectively, before manipulations including simple arithmetic operations, exclusive-OR and barrel shifting, are performed on it. However, the advent of very large scale integration (VLSI) semiconductor technology has provided the opportunity to implement previously impractical computing system architectures. See for example "VLSI: The Challenge to Innovate", VLSI systems Design, November, 1988 at p. 6. Thus, it is now practical to design systems which do many things faster or many more things in the same time, or both, than was previously possible. In particular, it is now possible to implement a practical DMA subsystem which transfers data faster, and which can manipulate data during the transfer, i.e. on-the-fly.