The present invention relates generally to graphics data processing, and in particular to methods and systems for efficiently managing a graphics processing unit containing graphics modules configured to process data in different formats.
Graphics processing includes the manipulation, processing and displaying of images. Images are displayed on video display screens. The smallest element of a video display screen is a pixel (picture element). A screen can be broken up into many tiny dots and a pixel is one or more of those tiny dots that is treated as a unit. A pixel includes the four quantities red, green, blue, and alpha, which are retrieved by the texture module using texture coordinates (S,T,R,Q).
Graphics processing units are divided into graphics modules, which each handle different operations of the graphics processing. For example, the texture module is a module that handles textures of images. Textures are collections of color data stored in memory. The texture module reads this color data, applies a filter to the data read and returns the filtered data to a process controller. The raster operation module (ROP) handles the conversion of vector graphics images, vector fonts, or outline fonts into bitmaps for display. Graphics modules typically process data in quads. A quad is defined as a unit of 4 pixels that are arranged on a display as 2×2 pixels with 2 pixels on the top and 2 pixels on the bottom. Since one quad includes four pixels, and each pixel includes S, T, R, and Q values, one quad includes 16 scalars which are 4 S values, 4 T values, 4 R values, and 4 Q values. Quads are also data in quad form and these terms are used interchangeably. The quad is the fundamental unit at work and all of the components in the prior graphics processing unit are configured to process quads. For example, the texture module is designed to process quads because it accepts as inputs four texture coordinates (S,T,R,Q) and outputs four pixel colors each with red, green, blue and alpha values. Graphics modules are configured to process quads because they sometimes do calculations across adjacent pixels and a 2×2 arrangement of pixels is well suited for such calculations. Therefore, in order to optimize the performance of graphics modules configured to process quads, it is advantageous to process at least one quad per clock cycle so that the graphics modules can perform at least one task per clock cycle. Moreover, since prior graphics processing units include only graphics modules configured to process quads, the entire graphics processing unit can be optimized because all its modules can perform tasks within one clock cycle.
FIG. 1 is a block diagram illustrating the transfer of quads within a graphics processing unit where all of the graphics modules are configured to receive, transmit and process quads. FIG. 1 includes a core 105, a texture module 110 and a ROP module 115 exchanging quads through communication channels 120. Core 105, texture module 115, and ROP module 115 are all configured to process data in quads. Since all graphics modules within the graphics processing unit are configured to process quads, one quad can be transferred through the communication channel 120 in one clock cycle. For example core 105 transfers, in one clock cycle, to texture module 110 one quad, which contains the coordinates of 4 pixels arranged in a 2×2 format that would include (S0,T0,R0,Q0), (S1,T1,R1,Q1), (S2,T2,R2,Q2), and (S3,T3,R3,Q3). The format of this quad might be (S0, . . . S3, T0, . . . T3, R0, . . . R3, Q0, . . . Q3,). The texture module 110 receives this quad in one clock cycle and, therefore, it knows the coordinates of all four pixels in one clock cycle. The texture module then reads color data, filters the color data and sends the filtered color data to core 105. If the data format were different, such as where the address of each pixel was sent in different clock cycles, then the texture module would have to wait 4 clock cycles to start processing. The filtered data produced by the texture module 110 is transmitted back to the core 105 in quads that contain color data for all 4 pixels). Since each pixel has a red, green, blue and alpha value, one quad having 4 pixels has 16 values. Since the core receives all 16 color values of one quad in one clock cycle, the core can process the quad after one clock cycle. As with the texture module 110, if the data format was different then the core 105 would have to wait 4 clock cycles to start processing.
However, in some newer systems all of the graphics modules within the graphics processing unit are not designed to handle quads. Performance problem arise when one graphics module is designed to handle quads but another graphics module is designed to handle data in a different format. This inconsistency between graphics modules within the graphics processing unit creates discontinuity in the data that is transferred. An example of this inconsistency is when in one clock cycle a first graphics module transfers to a second graphics module a set of data but the second graphics module needs different data than was transferred to begin processing. The result of this inconsistency is that the second graphics module will be slowed down because it will have to wait additional clock cycle to acquire all of the data required to perform its operation. Since slowing down one of the graphics modules can slow down the entire graphics processing unit, this inconsistency in data formats can impact the performance of the entire graphics processing unit.
Therefore what is needed a system and method for integrating into a graphics processing unit different graphics modules configured for different data formats that produce inconsistent data outputs in one clock cycle without impacting the performance of the graphics processing unit.