Field of the Invention
The present invention relates generally to graphics rendering, and in particular, to compressing and decompressing framebuffer fragment data in a vector, 2D and/or 3D graphics rendering system, aiming to reduce memory usage and bandwidth requirements.
Background Art
A typical graphics rendering subsystem 100 is depicted schematically on FIG. 1 as a high level diagram of a specific implementation of a tile based graphics processing system 201 that is in accordance with the present invention. However, the technology described in this application is generally applicable in tile based rendering systems of different arrangements.
The specific arrangement of the graphic processing system includes a graphics processing unit (GPU) 200, which contains, inter alia, the tile based rendering logic 201. As is known in the art, the tile based hardware generates tiles of an output frame that is to be displayed on a display device 106 such as a LCD screen.
The tile based rendering technique is a method, according to which the two dimensional output array, frame or screen of the rendering process is subdivided or partitioned into a number of smaller two-dimensional regions usually referred as tiles. Each tile is rendered separately, either one after another or in parallel to the other tiles. The rendered tiles are then compiled to provide the complete output array or frame. The tiles generated by the tile rendering logic are typically saved in the framebuffer in a tile-by-tile basis. In mobile systems that are typically characterized by limited bandwidth, the tile based rendering approach, where a fragment may be read and written several times through the rendering process, reduces off-chip memory accesses.
Once a tile has been produced by the graphics hardware, it would then normally be written to a framebuffer 600 in the memory 104 (which memory may be typically a DDR-SDRAM) through an interconnection network 102 (write path 103). In the specific arrangement, the framebuffer is hosted in the system main memory, however different arrangements may be assumed, i.e., the framebuffer can be a separate off-chip memory or can be a separate, on-chip memory (either an SRAM or an embedded DRAM) residing in the display controller 107.
At regular intervals, the framebuffer 600 will be read by the display controller 107 in order to output the frame generated by the tile rendering logic 201 (read path 105) to the display. For the sake of completeness, the arrangement shown in FIG. 1 contains a host CPU 101, although this is not necessary. The bandwidth requirements to display even a static image on such as devices is relatively high, and data compression is a method that can be used to reduce the requirements on memory and bandwidth.
As mentioned, without loss of generality, we assume that the framebuffer is located in an off-chip random access memory (which memory may be DDR-SDRAM), which is typically the system main memory. The framebuffer at any given time contains a complete frame of data. The information in the framebuffer typically consists of color values for every fragment on the screen. Color values are commonly stored in 16-bit color or 24/32-bit color formats. An additional alpha channel is sometimes used to retain information about pixel transparency. The total amount of the memory required to drive the framebuffer depends on the resolution of the output display and on the color depth.
The rendering tiles that the output is divided into for rendering purposes, can be of any desired and suitable size or shape. The rendered tiles are preferably all the same size and shape, as is known in the art, although this is not necessary. Without loss of generality we assume that (and unless it is mentioned otherwise), each rendered tile is rectangular and the preferred size and shape is 16×16 fragments.
The applicants have identified as by others skilled in the art that memory bandwidth and memory accesses required to write every new tile generated by the tile rendering logic to the framebuffer and memory bandwidth and memory accesses required to read the generated tiles from the framebuffer by the display or the display controller can be significantly reduced if a sophisticated lossy or lossless compression technique is applied to every new generated tile. The method described in this application provides an adaptive compression technique tailed to the operation of a tile based rendering system.
There have been several disclosed methods for reducing framebuffer bandwidth: e.g US2011/0074800 and US2011/0102446 where they try to eliminate redundant framebuffer access, US 2010/0060629 where they use errors introduced by various pipeline stages to decide how to handle tile fragments, U.S. Pat. No. 6,411,295 where they apply compression principles for reducing z-buffer memory accesses.