1. Field of the Invention
The present invention generally relates to computer hardware and more specifically to distributed calculation of plane equations.
2. Description of the Related Art
The processing power of a modern central processing unit (CPU) may be supplemented using a co-processor, such as a graphics processing unit (GPU). Oftentimes, the GPU is used as a co-processor configured to process graphics data to generate pixels that are displayed on a screen. Graphics data may include graphics primitives, such as points or lines. The components of the GPU that generate pixels are collectively known as a “graphics processing pipeline.”
One of the steps implemented in the graphics processing pipeline involves determining which pixels on the screen fall within a triangle defined by three vertices. This step may be accomplished by first interpolating between the three vertices to define three edges of the triangle defined by the three vertices. Each edge can be described by a linear equation of the form Ax+By+C=0, where A, B, and C are coefficients in an x,y coordinate grid. The three edges can then be used to generate a plane equation in the form of Fx+Gy+Hz+J=0, where F, G, H, and J are coefficients in an x, y, z coordinate grid. Once these coefficients are known for each plane, a particular pixel with coordinates (x,y) may be determined to be inside or outside of a particular triangle.
The coefficients are used by one or more pixel shaders to colorize each pixel on the screen according to the attributes of the triangle, or triangles, that include the pixel. Prior art systems perform pixel shading with a group of processors known as “shading multiprocessors,” or SMs.
A portion of the prior art graphics processing pipeline 300 is described in FIG. 3. As shown, preprocessed graphics primitives are passed to a primitive evaluation engine (PEE) 302 that calculates the plane equation coefficients for each triangle described by the vertex data and sends these coefficients via a data bus 304 to SMs 306A-306N, where the coefficients are stored in a local triangle RAM (tRAM) 308. The entire set of plane equation data A′-N′ is sent along data bus 304 to each SM 306A-306N. Each SM receives the entire set of plane equation data A′-N′ and performs further processing (e.g., pixel shading) on a portion of the total plane equation data A′-N′. For example, SM 306A performs further processing on data A′, SM 306B performs further processing on data B′, and SM 306N performs further processing on data N′.
One disadvantage of this configuration is that because all of the plane equation coefficients are transmitted together across data bus 304, the size of data bus 304 must be at least equal the total amount of data processed by the SMs 306A-306N at each clock cycle. For example, if ten SMs 306 each process 10 bytes of plane equation coefficients during each clock cycle, then data bus 304 would need to be 100 bytes wide to provide input data to each SM 306. Each SM 306 would copy the relevant 10 bytes from data bus 304 at each clock cycle. However, this configuration is not scalable because adding additional SMs 306 to improve graphics processing capabilities would require the bit width of the data bus to be increased to an impractical size.
Accordingly, there remains a need in the art for a more efficient and scalable way to calculate plane equations in a graphics processing pipeline.