1. Field of the Invention
The invention relates generally to a highly integrated multimedia processor having a shared cache and tightly coupled central processing and graphical units and more specifically to employing a portion of the shared cache as a secondary level in a hierarchical texture cache architecture.
2. Description of Related Art
The following background information is provided to aid in the understanding of the application of the present invention and is not meant to be limiting to the specific examples set forth herein. Displaying 3D graphics is typically characterized by a pipelined process having tessellation, geometry and rendering stages. The tessellation stage is responsible for decomposing an object into geometric primitives (e.g. polygons) for simplified processing while the geometry stage is responsible for transforming (e.g. translating, rotating and projecting) the tessellated object. The rendering stage rasterizes the polygons into pixels and applies visual effects such as, but not limited to, texture mapping, MIP mapping, Z buffering, depth cueing, anti-aliasing and fogging.
The entire 3D graphics pipeline can be embodied in software running on a general purpose CPU core (i.e. integer and floating point units), albeit unacceptably slow. To accelerate performance, the stages of the graphics pipeline are typically shared between the CPU and a dedicated hardware graphics controller (a.k.a. graphics accelerator). The floating-point unit of the CPU typically handles the vector and matrix processing of the tessellation and geometry stages while the graphics controller generally handles the pixel processing of the rendering stage.
Reference is now made to FIG. 1 that depicts a first prior art system of handling 3D graphics display in a computer. Vertex information stored on disk drive 100 is read over a local bus (e.g. the PCI bus) under control by chipset 102 into system memory 104. The vertex information is then read from system memory 104 under control of chipset 102 into the L2 cache 108 and L1 cache 105 of CPU 106. The CPU 106 performs geometry/lighting operations on the vertex information before caching the results along with texture coordinates back into the L1 cache 105, the L2 cache 108 and ultimately back to system memory 104. A direct memory access (DMA) is performed to transfer the geometry/lighting results, texture coordinates and texture maps stored in system memory 104 over the PCI bus into local graphics memory 112 of the graphics controller 110 for use in rendering a frame on the display 114. In addition to storing textures for use with the graphics controller 110, local graphics memory 112 also holds the frame buffer, the z-buffer and commands for the graphics controller 110.
A drawback with this approach is inefficient use of memory resources since redundant copies of texture maps are maintained in both system memory 104 and the local graphics memory 112. Another drawback with this approach is the local graphics memory 112 is dedicated to the graphics controller 110, is more expensive than generalized system memory and is not available for general-purpose use by the CPU 106. Yet another drawback with this approach is the attendant bus contention and relatively low bandwidth associated with the shared PCI bus. Efforts have been made to ameliorate these limitations by designating a “swap area” in local graphics memory 112 (sometimes misdescriptively referred to as an off chip L2 cache) so that textures can be prefetched into local graphics memory 112 from system memory 104 before they are needed by the graphics controller 110 and swapped with less recently used textures residing in the texture cache of the graphics controller 110. The local graphics memory swap area merely holds textures local to the graphics card (to avoid bus transfers) and does not truly back the texture cache as would a second level in a multi-level texture cache. This approach leads to the problem, among others, of deciding how to divide the local graphics memory 112 into texture storage and swap area. Still yet another drawback with this approach is the single level texture cache in prior art graphics controllers consume large amounts of die area since the texture cache must be multi-ported and be of sufficient size to avoid performance issues.
Reference is now made to FIG. 2 that depicts an improved but not entirely satisfactory prior art system of handling 3D graphics display in a computer. The processor 120, such as the Pentium II™ processor from Intel corporation of Santa Clara Calif., comprises a CPU 106 coupled to an integrated L2 cache 108 over a so-called “backside” bus 126 that operates independently from the host or so-called “front-side” bus 128. The system depicted in FIG. 2 additionally differs from that in FIG. 1 in that the graphics controller 110 is coupled over a dedicated and faster AGP bus 130 through chipset 102 to system memory 104. The dedicated and faster AGP bus 130 permits the graphics controller 110 to directly use texture maps in system memory 104 during the rendering stage rather than first pre-fetching the textures to local graphics memory 112.
Although sourcing texture maps directly out of system memory 104 mitigates local graphics memory constraints, some amount of local graphics memory 112 is still required for screen refresh, Z-buffering and front and back buffering since the AGP bus 130 cannot support such bandwidth requirements. Consequently, the system of FIG. 2 suffers from the same drawbacks as the system of FIG. 1, albeit to a lesser degree. Moreover, there is no way for the graphics controller 110 to directly access the L2 cache 108 that is encapsulated within the processor 120 and connected to the CPU 106 over the backside bus 126.
From the foregoing it can be seen that memory components, bus protocols and die size are the ultimate bottleneck for presenting 3D graphics. Accordingly, there is a need for a highly integrated multimedia processor having tightly coupled central processing and graphical functional units that share a relatively large cache to avoid slow system memory access and the requirement to maintain separate and redundant local graphics memory, and to leverage the relatively large shared cache in a hierarchical texture cache architecture.