In computers and other devices it is known for a first processor (such as a host processor that includes one or more processing cores) to execute one or more applications (for example, graphics applications, word processing applications, drafting applications, presentation applications, spreadsheet applications, video game applications, etc.) that may require specialized or intensive processing. In those instances, the host processor will sometimes call upon a co-processor to execute the specialized or processing-intensive function. For example, if the host processor requires a drawing operation to be performed, it can instruct, via a data element (such as a command, instruction, pointer to another command, group of commands or instructions, address, and any data associated with the command), a video graphics co-processor to perform the drawing function.
Regardless of which processing unit is performing the function, the processing unit accesses memory to perform the tasks assigned to it. Some tasks require memory that is as fast as possible, other tasks require memory that can be slower. Generally speaking, faster memory is more expensive. In addition to speed, power consumption of the memory is also a concern. Accordingly, efficiency concerns encourage that processes for which the slower memory is sufficient utilize the slower memory. Furthermore, efficiency concerns encourage that only as much fast memory be provided as is truly needed. Memory speed is often of importance for storing data that the processor will need to directly access. Slower memory can be used to store data in between times when the processor is expected to need to directly access it. Accordingly, memory setups often contain a smaller amount of faster memory and larger amounts of slower memory. Similarly, use of power-efficient memory, when possible provides advantages and cost savings. In some examples, the fast and/or energy efficient memory is memory that is provided “on-chip.” Whereas the slower and/or less energy efficient memory is external to the processing chip.
Drawing operations via rasterization or otherwise, provide that each pixel has a plurality of properties (such as color, depth, stencil, and others). These properties are typically imparted to pixels through command streams that are composed one or more frame buffer operations (such as clear, swap, texture binding), state setting operations, and drawing operations. Each drawing operation imparts properties to the fragments generated by drawing operations through shaders. Shaders may read input streams (vertex buffers or vertex streams, or the results of previous stages of shading), may read memory (uniform or constant buffers), may read textures (texels—a single pixel from a texture map), read and write memory (images load store or unordered access views), write memory (transform feedback or stream output), and may perform additional operations to determine the properties of the fragments. The properties of the pixels may be determined by the properties of the fragments and optionally by the existing properties of the pixel through pixel operations (blending or output merging).
Processing of the command stream dictates that the entirety of the command stream of a pass (delineated by one or more framebuffer operations) must be completed before proceeding to the next pass. Stated differently, every pixel for a given screen must be given its color value for a pass before the processor returns to begin imparting color values to each pixel for the next pass. Drawing operations are formed this way because information in “dependent passes” may be dependent upon the previous pass values of any texel assigned to one or more textures. Accordingly, for each pass that depends upon modified texture color values for its texture reads, the color values of previous textures, in general, must have been previously determined.
The processor imparting the characteristics to the pixels utilizes the smaller faster memory, which as stated previously may be located “on chip.” The smaller faster memory is typically not large enough to store an entire screen worth of pixels and all the various dependent attributes therein. Accordingly, the screen as a whole is stored in the larger slower memory. The screen is divided up into “sub-sections” referred to as “tiles.” Tiles may include one or more pixels therein. This process is referred to as tiled rendering.
Tiled rendering processes image information to map primitives onto tiles. In a first pass, information about a first tile is pulled from the large memory into the smaller memory. Maps are used to determine a color to be ascribed to the pixels at issue. While it is understood that ascribing features to tiles involves ascribing features to pixels, the below discussion discusses ascribing features to tiles in that pixels within a tile are processed together with other pixels within the tile. Once color is assigned, the tile with its associated color is written back to the large memory, via a swap buffer operation or otherwise. The processor then moves on to the next tile by calling for it to be pulled from the large memory into the small memory. Color is then assigned to the second tile, followed by the tile being sent back to the large memory. This is repeated for all needed tiles (all needed tiles may be those for all of a screen if a full screen is to be presented or for portions of the screen if part of the screen is being rendered, such as in interlacing or in embodiments where multiple processors each are responsible for portions of the screen.) Subsequently, each tile is again called from the large memory to the small memory to have subsequent, dependent, passes applied to the pixels in each of the tiles (texture, shading, etc.). Once all attributes are applied to all the tiles, the large memory has a completed screen (or screen portion) therein. This screen can then be sent to be rendered on a physical display (CRT, LCD, etc.).
Accordingly, there is a large amount of transfer of tile data between the two memory sites. The transfers take time and power. Thus, what is needed is a tiled rendering operation that reduces the transfers of data while also not requiring a significant increase in the amount of “faster memory” used to assign attributes to tiles.