1. Field of the Invention
The present invention relates generally to a method and apparatus for enhancing performance of graphics operations. More particularly, the invention relates to two-dimensional graphics operations.
2. Description of the Related Art
For many graphics operations such as line drawing, polygon rendering, etc., one of the main bottlenecks for speed performance is the bandwidth of the frame buffer. The bandwidth of the frame buffer is even more critical for system on a chip (SOC) devices that use unified memory architecture (UMA). In UMA systems, the same memory is shared between many agents, such as a central processing unit (CPU), a graphics accelerator, an MPEG decoder, and a CRT controller. Because there are several agents sharing the same memory, a large access latency for the memory is created. The typical access latency for the memory is fifteen clock cycles or more.
Memory systems implemented with synchronous dynamic random access memory (SDRAM) have advantage that if an agent requests a large chunk of sequential pixel data, then the first pixel data latency may be the same as normal DRAM memory, but subsequent pixel data can be transferred every clock cycle (provided the pixel data is located in the same page of memory). Unfortunately, graphics accelerators cannot use this feature directly for most drawing operations, including line drawing where the access of pixel data is not very sequential.
FIG. 1 shows a conventional UMA System 100 where an SDRAM controller 110 is electrically coupled between an SDRAM memory 120A and several agents, such as a CPU 130, an MPEG decoder 140, a graphics engine 150, and a CRT controller 160. A frame buffer 125 comprises the combination of the SDRAM controller 110 and part of the SDRAM memory 120A. As suggested by the UMA system 100, each agent operating in a graphics system communicates to the SDRAM 120A via the SDRAM controller 110. Because each agent communicates directly to the SDRAM controller 110, each agent must operate with a memory configuration as defined by the frame buffer 125, whether or not each agent is optimized for the memory configuration within the frame buffer 125.
It should be understood that the frame buffer 125 is a memory for storing pixel data for a visual display. To manipulate an image on the visual display, pixel data may be written to the frame buffer 125 by, for instance, the CPU 130, and read by the CRT controller 160 to display the pixel data on the visual display. For instance, to make the visual display completely white, each memory element within the frame buffer 125 may be written pixel data corresponding to the color white. Also, if graphical elements, such as a triangle or rectangle, is to be drawn, the graphics engine 150 may write pixel data to the frame buffer 125 to create or manipulate different graphical elements.
Most operations in windows type programs are rectangles. Because rectangles are highly sequential elements (e.g., pixels 10, 11, . . . , 20) for each scan line, standard scan line configured memories are suitable for standard windows operations. However, in systems that utilize non-horizontal or non-vertical lines, such as computer aided design (CAD) systems and map display systems, access to pixel data is not sequential in the frame buffer 125, thereby causing a problem or degrading system performance.
The system performance issues are related to the frame buffer 125 generally being implemented using SDRAM memory. SDRAM memory typically has a certain latency or access time to access a first memory location (e.g., pixel 10), but not much time to access the next memory location (e.g., pixel 11). While this is adequate for forming straight lines (i.e., horizontal or vertical), non-straight lines (i.e., non-horizontal or non-vertical) are much slower to create or manipulate due to the memory configuration within the frame buffer 125 being arranged in a scan line format.
FIG. 2 represents a memory organization for an SDRAM 120B having a scan line configuration for the pixel data displayed on a visual display, as is conventionally known in the art. For instance, SDRAM 120B represents a visual display having a 768 row by 1024 column pixel image, whereby a first scan line has pixel P1,1 200 in the upper left corner and pixel P1,1024 210 in the upper right corner and a last scan line having pixel P763,1 220 in the lower left corner and pixel P768,1024 230 in the lower right corner. FIG. 2 represents a conventional scan line configured frame buffer 125 that agents, such as the CRT controller 160, are optimized for displaying pixel data from the frame buffer 125 to a visual display. However, the graphics engine 150 performance is significantly reduced (i.e., clock cycles increased) by the conventional scan line configuration in the frame buffer 125.
Access patterns of many graphics operations that include line drawing show a high degree of two-dimensional locality of reference. One approach to improve memory bandwidth for such access patterns has been to use a tiled memory configuration within the frame buffer 125. In this tiled memory configuration, pixel data in the frame buffer 125 is stored in a tiled configuration rather than the conventional configuration (i.e., sequentially one scan line after another (assuming one tile of pixels fits into one DRAM page)). Because of the two-dimensional locality of reference, chances of consecutive accesses falling inside a single tile is very high. As long as the access from the graphics operation falls within a single tile, there will be no page miss for the tiled memory configuration within the frame buffer 125 memory, compared with the conventional scan line or non-tiled memory configuration where a tile is scattered over multiple pages of DRAM.
FIG. 3 represents a frame buffer 120C configured into tiles of fixed size. A first tile T0300 is indicated to be in the upper left corner and a last tile T383310 is indicated to be in the lower right corner of the frame buffer 120C. While the frame buffer 125 having fixed tile sizes greatly improves frame buffer access time for graphics operations for the graphics engine 150, fixed tiling the frame buffer 125 is not suitable for UMA systems where a number of other agents expecting the frame buffer 125 to be in the conventional scan line configuration access the frame buffer 125. It is important to understand that performance of certain agents, such as the CRT controller 160, is critical to a real-time system because a user sees visual effects on the visual display if the performance of the CRT controller 160 is degraded. By having the frame buffer 125 configured in a tiled fashion a degradation in the performance of the CRT controller 160 occurs as the CRT controller 160 is optimized for the conventional scan line configuration.
Additionally, there are many existing software products, including many popular video games, which access the frame buffer 125 directly. Since these software programs are not aware of tiled configurations within the frame buffer 125, their access to the frame buffer 125 will not work properly and, consequently, these software programs cannot be run on a system with a tiled frame buffer 125 unless a separate address translator interface is added to translate scan line addresses to tiled addresses. Tiling the frame buffer 125 also has additional overhead for the CPU 130 because of the need to convert many of the windows pixel data structures, such as BitMap images, into a tiled format so that the BitMaps images can be used for graphics operations with the frame buffer 125 having a tiled configuration.
The present invention relates to a system and method for accelerating graphics operations. The system includes a memory device for accelerating graphics operations within an electronic device. A memory controller is used for controlling pixel data transmitted to and from the memory device. A cache memory is electrically coupled to the memory controller and is dynamically configurable to a selected usable size to exchange an amount of pixel data having the selected usable size with the memory controller. The memory device is preferably an SDRAM. A graphics engine is electrically coupled to the cache memory, which stores pixel data, generally forming a two-dimensional image in a tiled configuration. The cache memory may also comprise a plurality of usable memory areas or tiles.
The present invention also includes a method for accelerating graphics operations within an electronic device. The method includes receiving a request for accessing data relating to a pixel. A determination is made as to which pseudo tile the pixel is located. The pseudo tile is selectively retrieved from a memory device and stored in a cache memory in a tile configuraiton. The requested pixel data is provided from the cache memory, which contains at least one tile.