The present invention relates to graphics processing units (GPUs) and in particular to a method of accelerating processing by such units by compressing data subject to transfer between GPUs and their off-chip memory.
GPUs) provide a computer architecture specialized for graphics processing but they may also work with a conventional computer processing unit (CPU) to accelerate highly-parallel general-purpose applications. In this regard, GPUs provide for a large number of computational elements that may operate in parallel on data held in special GPU memory.
In normal operation, the CPU prepares data to be operated on by the GPU and loads that data into the GPU memory together with information about the desired GPU functions to be performed. The multiple computational elements then execute the desired functions and the data is returned to the CPU memory from the GPU memory.
The normal problems of long latency related to access by the computational elements to off-chip memory can be readily accommodated in a GPU by flexibly switching the computational elements to a different thread (context switching) when a given thread is facing a memory access delay. Since such switching is “lightweight” in a GPU because the GPU has a large register file so that multiple “in-flight” threads do not need to move their data from the register file to a secondary memory. For this reason such switches between threads can be accomplished rapidly with little delay, making this an effective strategy for dealing with memory latency.