The present invention relates generally to graphics processors and more particularly to executing particular types of computational algorithms using graphics processors.
The demand for increased realism in computer graphics for games and other applications has been steady for some time now and shows no signs of abating. This has placed stringent performance requirements on computer system components, particularly graphics processors. For example, to generate improved images, an ever increasing amount of data needs to be processed by a graphics processing unit. In fact, so much graphics data now needs to be processed that conventional techniques are not up to the task and need to be replaced.
Fortunately, the engineers at NVIDIA Corporation in Santa Clara, Calif. have developed a new type of processing circuit that is capable of meeting these incredible demands. This amazing new circuit is based on the concept of multiple single-instruction, multiple-data processors. These new processors are capable of simultaneously executing hundreds of processes.
These new processors are so powerful that they are being put to use for other functions beyond their traditional realm of graphics processing. These functions include tasks that are normally left for a central processing unit to execute. By taking over these functions, the work load on the central processing unit is reduced, improving system performance. Alternately, this allows a slower, less-expensive central processing unit to be used.
Computations are one type of function that is now being performed by these new graphics processors. These computations may become particularly intensive when they involve lattices or matrices of data. These situations require the storage of large amounts of data. Unfortunately, memory is very expensive to include on a graphics processor. This is partly because the processing steps that are used to manufacture efficient low cost memories are not compatible with processes used for graphics processors. Accordingly, most data used by a graphics processor is stored externally. But access to an off-chip memory is slow; the latency involved in reading data may be hundreds of clock cycles. This latency reduces the computational efficiency of the graphics processor.
Thus, what is needed are circuits, methods, and apparatus that allow a graphics or other processor to perform computations involving large amounts of data while reducing the amount of data read from an external memory.