The present invention is in the field of computer graphics processing. Particularly the present invention relates to techniques for increasing the efficiency of image-processing related computations. The invention described herein is particularly applicable to use in systems having a central processing unit (CPU) operating together with a dedicated graphics processing unit (GPU). Various implementations of such an architecture are described in assignee's patent applications: “System for Reducing the Number of Programs Necessary to Render an Image,” by John Harper, Ser. No. 10/826,773; “System for Optimizing Graphics for Operations,” by John Harper, Ralph Brunner, Peter Graffagnino, and Mark Zimmer, Ser. No. 10/825,694; “System for Emulating Graphics Operations,” by John Harper, Ser. No. 10/826,744; and “High Level Program Interface for Graphics Operations,” by John Harper, Ralph Brunner, Peter Graffagnino, and Mark Zimmer, Ser. No. 10/826,762, each filed 16 Apr. 2004 and incorporated herein by reference in its entirety. Although the methods and techniques described herein are particularly applicable to systems having a single CPU/single GPU architecture, there is no intent to restrict the invention to such systems. It is believed that the methods and techniques described herein may be advantageously applied in a variety of architectures.
In the object-oriented programming context of most modern graphics processing systems, there are generally four types of objects available to a programmer: images, filters, contexts, and vectors. An image is generally either the two dimensional result of rendering (a pixel image) or a representation of the same. A filter is generally high-level functions that are used to affect images. A context is a space, such as a defined place in memory where the result of a filtering operation resides. A vector is a collection of floating point numbers, for example, the four dimensional vector used to describe the appearance of a pixel (red, blue, green and transparency levels). Each of these definitions is somewhat exemplary in nature, and the foregoing definitions should not be considered exclusive or otherwise overly restrictive.
Most relevant to the purposes of the present invention are images and filters. In an embodiment of the present invention, filter-based image manipulation may be used in which the manipulation occurs on a programmable GPU. A relatively common filter applied to images is a blur. Various blurs exist and are used for shadow, the depiction of cinematic motion, defocusing, sharpening, rendering clean line art, detecting edges, and many professional photographic effects. A special blur is the Gaussian blur, which is a radially symmetric blur. Other, more complicated blurs and other convolution operations can often be separated into linear combinations of Gaussian blurs. Because the Gaussian blur is the cornerstone of many image processing algorithms, it is essential to have a fast way of computing it. It is even more desirable to have a way of computing a Gaussian blur that does not tie up the CPU in the calculation.
Modern programmable graphics processing units (GPUs) have reached a high level of programmability. GPU programs, called fragment programs, allow the programmer to directly compute an image by specifying the program that computes a single pixel of that image. This program is run in parallel by the GPU to produce the result image. To exactly compute a single pixel of Gaussian blur with any given radius it is technically necessary to apply a convolution over the entire source image. This is far too computationally intensive to implement. In practice, only approximations are calculated. To compute the approximation, it is important to use a minimum number of source image lookups (texture lookups). GPU fragment programs typically only allow a small maximum number of textures. Thus a scheme which minimizes the number of passes and maximizes the blurring work done with each pass is sought.