Flexible data scatter-gather is a common data transfer technique. Scatter-gather is widely used, for example, in modern systems-on-chip (SOC) for processes such as direct memory access (DMA), system data management, and the like.
The term “gather” refers to the process of gathering data from multiple buffers. The gather process is conventionally performed by a device referred to as a “packer,” and includes “packing” (that is, aligning and concatenating) the data into a single continuous buffer. The term “scatter” refers to the process of scattering data into multiple buffers. The scatter process is conventionally performed by a device referred to as an “unpacker,” and includes “unpacking” (that is, separating a data block into multiple blocks for transfer to multiple buffers).
FIGS. 1 through 4 illustrate a conventional scatter-gather DMA operation for a storage system. In FIG. 1, a conventional packer 102 gathers a single file stored in three input buffers 104A,B,C into a single temporary buffer 104D. Then two conventional unpackers 106A,B transfer the file to two different locations by scattering the data from temporary buffer 104D to five output buffers 104E,F,G,H,I. In particular, unpacker 106A scatters the data from temporary buffer 104D to output buffers 104E,F,G and unpacker 106B scatters the data from temporary buffer 104D to output buffers 104H,I.
FIGS. 2 through 4 show the results of the conventional scatter-gather operation of FIG. 1. In FIGS. 2 through 4, each byte of data is represented by a box. Bytes from different input buffers 104A,B,C are represented by different cross-hatching patterns. Empty boxes represent “don't-care” bytes (that is, bytes that are not relevant to the illustrated operation).
FIG. 2 shows the results of the packing operation of FIG. 1 for buffers 104A,B,C,D. In this example, the data bus is eight bytes wide. The source file is 76 bytes long, and is physically stored as three different source blocks in three different physical locations (input buffers 104A,B,C) with different lengths. For source block 0 (represented by vertical cross-hatching), the start address is 0x0002, and the block size is 33 bytes. For source block 1 (represented by horizontal cross-hatching), the start address is 0x0203, and the block size is 3 bytes. For source block 2 (represented by diagonal cross-hatching), the start address is 0x2005, and the block size is 40 bytes. FIG. 2 shows how the blocks have been concatenated and aligned in temporary buffer 104D by packer 102.
FIG. 3 shows the results of the unpacking operation of FIG. 1 for buffers 104D,E,F,G. Unpacker 106A has transferred the file from temporary buffer 104D to output buffers 104E,F,G (referred to herein as destination 0) as three blocks according to specified block lengths and start addresses. In particular, unpacker 106A has transferred destination 0 block 0 to output buffer 104E with a start address 0x4004 and a block size of 20 bytes, has transferred destination 0 block 1 to output buffer 104F with a start address of 0x3007 and a block size of 37 bytes, and has transferred destination 0 block 2 to output buffer 104G with a start address of 0x3203 and a block size of 19 bytes.
FIG. 4 shows the results of the unpacking operation of FIG. 1 for buffers 104D,H,I. Unpacker 106B has transferred the file from temporary buffer 104D to output buffers 104H,I (referred to herein as destination 1) as two blocks according to specified block lengths and start addresses. In particular, unpacker 106B has transferred destination 1 block 0 to output buffer 104H with a start address 0x8003 and a block size of 55 bytes, and has transferred destination 1 block 1 to output buffer 104I with a start address of 0x9002 and a block size of 21 bytes.
FIG. 5 shows a block diagram of a conventional packer 500 for a 64-bit bus. Packer 500 includes a controller 502, a byte shifter 504, a byte mapper 506, two eight-byte buffers 508A,B, and a multiplexer (Mux) 510. Controller 502 operates according to external input control signals Din_valid, Din_loc, Din_len, and Dout_ready, which are generated by a DMA controller or the like, and generates external output control signals Din_ready and Dout_valid, which are provided to a DMA controller or the like. Byte shifter 504 receives input data Din, and shifts that data according to control signal Byte_shift_ctrl provided by controller 502. Byte mapper 506 maps the bytes of the shifted data to buffers 508 according to control signal Byte_map_ctrl provided by controller 502. Multiplexer 510 passes selected bytes of the data from buffers 508 as output data Dout according to control signal Dout_sel provided by controller 502.
Conventional scatter-gather techniques have several disadvantages. Conventional packers and unpackers have different designs with opposite data flows. Therefore conventional scatter-gather systems must employ both, and must employ a temporary buffer 104 between the packers and unpackers. Conventional packers and unpackers also employ a byte mapper 506, which is generally implemented as a large, slow, multi-level multiplexer. The use of a byte mapper requires an internal buffer 508 that is twice the width of the data bus. And because conventional packers and unpackers operate using a push model, they cannot exert back pressure upon the input, and so require a fixed pipeline implementation.