For byte or half-word transfers, a conventional direct memory access (DMA) unit requires a single access per transfer. Furthermore, certain implementations of Bus protocols, such as on-chip peripheral bus (OPB) and memory slaves do not allow burst transfers on non-word transfer types. This can affect the performance of direct random access memory (DRAM) type slaves that have a high latency for the first transfer, but then provide single cycle access for subsequent sequential transfers. For such an implementation, a conventional DMA architecture will be negatively impacted by the full latency of the DRAM slave for every access.
The present invention clubs together sequential byte or half-word transfers and accesses up-to 4 bytes or 2 half-words in one access by performing a full word transfer. This invention reduces the total accesses on the bus and also obtains single cycle access for a majority of the transfers by utilizing the advantage of the issuance of burst requests.
This invention discloses a novel method of handling byte and half-word data in a DMA architecture designed for a 32-bit Bus. The proposed method and architecture compacts data and fetches four bytes or two half-words in one transaction by performing a full-word transfer instead of a partial transfer, thus eliminating the input-output (IO) bottleneck which plague many DMA designs. Though the byte-enable support architecture described in this document is specific to certain DMA architecture, the novel method of this invention is applicable to any DMA architecture. Furthermore, throughout the design of the method and system design disclosure of this invention, special emphasis has been placed on achieving single cycle throughput while still maintaining a high operating frequency.
FIG. 1 is an example illustrating a memory access 101 of thirteen bytes starting from the address 0×107. For unpacked access 103, such as in a conventional DMA, the total number of accesses required to complete this set of transactions is thirteen, with each access requiring ten cycles. This requirement of ten cycles is specific to this example. The total number of cycles thus required is 130. In contrast, under this invention, for packed accesses 102 only 4 transfers are required on the bus. Furthermore, the last two transfers utilize the burst mode facility and each transfer is completed in a single cycle. Thus, the total number of cycles required under this invention is only 22, an order of magnitude less than that required by conventional methods.