Direct memory access (DMA) is typically used for data transfers from a single source to a single destination. In known computer systems, a DMA controller takes control of the system bus from the central processor, and transfers a block of data between a source and a destination, typically between memory and a peripheral device, using much less bandwidth and in a shorter amount of time than if the central processor executed the transfer.
However, in some systems, logical operations must be performed on the data to be transferred, which operations may be required for data integrity. Such operations may include exclusive OR (XOR) operations, parity generation, checksum generation, and the like. For example, a XOR operation is required in data transfers using Redundant Array of Identical Disc (RAID) systems, and in particular, in the RAID-5 systems.
Some known systems utilize two or more DMA channels to handle data transfer and the associated logical operation. For example, in such systems a first DMA controller may transfer data from the source to a first destination, such as the host, while a second DMA controller may handle data transfer from the same source to a second destination, where a logical operation is performed on the data.
However, the second DMA channel and its associated logical processing places a burden on the system, and may significantly impact the transfer rate because in known systems it is difficult or impossible to perform the second DMA operation on-the-fly without adding significant delay to the data transfer. As a worst case scenario, use of two DMA channels may double the time required to transfer the data. Even if the transfer speed is not impacted by a factor of two, known systems nonetheless experience significant reductions in transfer speed when a second DMA channel competes for data from a common source.
In addition, when DMA is used to transfer data in systems using a PCI Express (“PCIe” or Peripheral Component Interconnect Express) protocol, which is a high-speed serial data transfer protocol commonly used in personal computers, two main constraints are introduced that render known systems disadvantageous. First, the DMA transfer should be aware of the maximum data payload size constraint for each data transfer, as set forth by the PCIe standard. Second, once the DMA data transfer has started in a PCIe compliant system, there is no data flow control mechanism inherent in the PCIe protocol that provides adequate arbitration. In such systems, once begun, the DMA transfer must run to completion, which typically adversely impacts transfer speed.
Some known systems use a pipeline approach to perform DMA and logical operations, such as parity checking and the like, where first, the logical operation is performed on a portion of the data, and when such a logical operation is completed, the data can then be transferred by the other DMA device to the host, for example, However, this pipeline approach increases the transfer latency when smaller data transfers are involved. Further, this approach is inefficient.
Memory devices, such as for example, the flash memory devices and other memory devices mentioned above, have been widely adopted for use in consumer products, and in particular, computers using PCIe protocol. Most computer systems use some form of DMA to transfer data to and from the memory.
Flash memory may be found in different forms, for example in the form of a portable memory card that can be carried between host devices or as a solid state drive (SSD) embedded in a host device. Two general memory cell architectures found in flash memory include NOR and NAND. In a typical NOR architecture, memory cells are connected between adjacent bit line source and drain diffusions that extend in a column direction with control gates connected to word lines extending along rows of cells. A memory cell includes at least one storage element positioned over at least a portion of the cell channel region between the source and drain. A programmed level of charge on the storage elements thus controls an operating characteristic of the cells, which can then be read by applying appropriate voltages to the addressed memory cells.
A typical NAND architecture utilizes strings of more than two series-connected memory cells, such as 16 or 32, connected along with one or more select transistors between individual bit lines and a reference potential to form columns of cells. Word lines extend across cells within many of these columns. An individual cell within a column is read and verified during programming by causing the remaining cells in the string to be turned on so that the current flowing through a string is dependent upon the level of charge stored in the addressed cell.
NAND flash memory can be fabricated in the form of single-level cell flash memory, also known as SLC or binary flash, where each cell stores one bit of binary information. NAND flash memory can also be fabricated to store multiple states per cell so that two or more bits of binary information may be stored. This higher storage density flash memory is known as multi-level cell or MLC flash. MLC flash memory can provide higher density storage and reduce the costs associated with the memory. The higher density storage potential of MLC flash tends to have the drawback of less durability than SLC flash in terms of the number write/erase cycles a cell can handle before it wears out. MLC can also have slower read and write rates than the more expensive and typically more durable SLC flash memory. Memory devices, such as SSDs, may include both types of memory.