As computer systems have advanced, graphics processing units (GPUs) have become increasingly advanced. For example, GPUs include multiple execution units and maintaining the execution units as busy has become an increasingly important task to ensuring high overall GPU performance.
GPUs often include texture units for performing texturing operations. The texture units need to access a texture data stored in memory to perform the texture operations. The memory access usually takes a relatively long time relative to the speed of the execution units of the GPU. Correspondingly, several texture requests are issued at a given time. In some conventional GPUs, each of the texture requests designates a register to obtain the parameters and that register is typically the same for the result of the texture operation.
Unfortunately, register file storage for all the texture operations of a texture pipeline that can be in flight amounts to large amounts of memory storage. The limit on storage thereby creates a limit on the number of requests that can be issued and therefore pixels that can be processed at a time. One solution has been to write the results of the texture operation back to the storage location that contained the texture request. However, this solution still leaves the storage result space allocated for the texture request meaning that the number of pixels can that can be processed in the pipeline at any give time is limited and execution units may not be kept busy thereby impacting the overall performance of the GPU.
One other solution is for each thread to have multiple outstanding texture requests, because this means that few threads are required to cover the texture latency. Unfortunately, each thread also would need more registers to accommodate simultaneous storage of intermediate results. This increases the size of the register file required. It would be advantageous to be able to issue multiple texture instructions per thread, while not requiring a correspondingly large register file. In this fashion, more threads could be issued using the available register file size that is saved, e.g., not allocated to the texture operations.