1. Field of the Invention
The present invention generally relates to computer hardware. More specifically, the present invention relates to data packers for packing and aligning write data before the write data is written to memory.
2. Description of the Related Art
A modern computer system may be implemented with a processor that executes many operations in parallel known as a parallel processing unit (PPU). PPUs generally include one or more engines (or clients), that perform operations such as memory management, graphics display, instruction fetching, encryption, and other operations.
Clients often write data to and read data from parallel processor (PP) local memory and system memory. This data may include texture maps, 3-D models, or other types of data. Different clients may write data to PP memory differently depending on the configuration of each client. For example, one client may write data in 4-byte bursts, while another client may write data in 32-byte bursts. Different clients may also write data with different address alignments. For example, one client may write data that is 16-byte address aligned, while another client may write data that is 8-byte address aligned. Naïve clients may also write memory that has a data type into which the data to be written must be transformed (e.g. block linear vs pitch linear).
Inefficiencies result when a client writes data to PP memory using a conventional data packer, which may output data with a burst size that is smaller than the maximum burst size received by PP memory. In such a case, the entire memory bandwidth of PP memory may be consumed during a write cycle even though some of the bandwidth is not being used. In general, a client that uses a data packer with a burst size that is smaller than the maximum burst size received by PP memory to write data may underutilize the bandwidth of PP memory by writing data in small bursts, thus not taking full advantage of the available bandwidth. For example, if PP memory could receive a maximum burst size of 32 bytes and a data packer with a write burst size of 4 bytes was used by a client to write 64 bytes of data, 16 clock cycles would be required. As a result, a client with a low write burst size may waste clock cycles by not taking full advantage of the maximum burst size PP memory may receive.
Accordingly, there remains a need in the art for a technique to more efficiently write data to PP memory.