1. Field of the Invention
The present invention relates to computer systems, and more particularly, to an apparatus for mapping virtual addresses to physical addresses in graphics applications.
2. Description of the Related Technology
As shown in FIG. 1, a conventional computer system architecture 100 includes a processor 102, system logic 104, main memory 106, a system bus 108, a graphics accelerator 110 communicating with a local frame buffer 112 and a plurality of peripherals 114. The processor 102 communicates with main memory 106 through a memory management unit (MMU) in the system logic 104. Peripherals 114 and the graphics accelerator 110 communicate with main memory 106 and system logic 104 through the system bus 108. The standard system bus 108 is currently the Peripherals Connection Interface (PCI). The original personal computer bus, the Industry Standard Architecture (ISA), is capable of a peak data transfer rate of 8 megabytes/sec and is still used for low-bandwidth peripherals, such as audio. On the other hand, PCI supports multiple peripheral components and add-in cards at a peak bandwidth of 132 megabytes/sec. Thus, PCI is capable of supporting full motion video playback at 30 frames/sec, true color high-resolution graphics and 100 megabits/sec Ethernet local area networks. However, the emergence of high-bandwidth applications, such as three dimensional (3D) graphics applications, threatens to overload the PCI bus.
For example, a 3D graphics image is formed by taking a two dimensional image and applying, or mapping, it as a surface onto a 3D object. The major kinds of maps include texture maps, which deal with colors and textures, bump maps, which deal with physical surfaces, reflection maps, refraction maps and chrome maps. Moreover, to add realism to a scene, 3D graphics accelerators often employ a z-buffer for hidden line removal and for depth queuing, wherein an intensity value is used to modify the brightness of a pixel as a function of distance. A z-buffer memory can be as large or larger than the memory needed to store two dimensional images. The graphics accelerator 110 retrieves and manipulates image data from the local frame buffer 112, which is a type of expensive high performance memory. For example, to transfer an average 3D scene (polygon overlap of three) in 16-bit color at 30 frames/sec at 75 Hz screen refresh, estimated bandwidths of 370 megabytes/sec to 840 megabytes/sec are needed for screen resolutions from 640.times.480 resolution (VGA) to 1024.times.768 resolution (XGA). Thus, rendering of 3D graphics on a display requires a large amount of bandwidth between the graphics accelerator 110 and the local frame buffer 112, where 3D texture maps and z-buffer data typically reside.
In addition, many computer systems use virtual memory systems to permit the processor 102 to address more memory than is physically present in the main memory 106. A virtual memory system allows addressing of very large amounts of memory as though all of that memory were a part of the main memory of the computer system. A virtual memory system allows this even though actual main memory may consist of some substantially lesser amount of storage space than is addressable. For example, main memory may include sixteen megabytes (16,777,216 bytes) of random access memory while a virtual memory addressing system permits the addressing of four gigabytes (4,294,967,296 bytes) of memory.
Virtual memory systems provide this capability using a memory management unit (MMU) to translate virtual memory addresses into their corresponding physical memory addresses, where the desired information actually resides. A particular physical address holding desired information may reside in main memory or in mass storage, such as a tape drive or hard disk. If the physical address of the information is in main memory, the information is readily accessed and utilized. Otherwise, the information referenced by the physical address is in mass storage and the system transfers this information (usually in a block referred to as a page) to main memory for subsequent use. This transfer may require the swapping of other information out of main memory into mass storage in order to make room for the new information. If so, the MMU controls the swapping of information to mass storage.
Pages are the usual mechanism used for addressing information in a virtual memory system. Pages are numbered, and both physical and virtual addresses often include a page number and an offset into the page. Moreover, the physical offset and the virtual offset are typically the same. In order to translate between the virtual and physical addresses, a basic virtual memory system creates a series of lookup tables, called page tables, stored in main memory. These page tables store the virtual address page numbers used by the computer. Stored with each virtual address page number is the corresponding physical address page number which must be accessed to obtain the information. Often, the page tables are so large that they are paged themselves. The page number of any virtual address presented to the memory management unit is compared to the values stored in these tables in order to find a matching virtual address page number for use in retrieving the corresponding physical address page number.
There are often several levels of tables, and the comparison uses a substantial amount of system clock time. For example, to retrieve a physical page address using lookup tables stored in main memory, the typical MMU first looks to a register for the address of a base table which stores pointers to other levels of tables. The MMU retrieves this pointer from the base table and places it in another register. The MMU then uses this pointer to go to the next level of table. This process continues until the physical page address of the information sought is recovered. When the physical address is recovered, it is combined with the offset furnished as a part of the virtual address and the processor uses the result to access the particular information desired. Completion of a typical lookup in the page tables may take from ten to fifteen clock cycles at each level of the search.
To overcome this delay, virtual management systems often include cache memories called translation look aside buffers (TLBs). A TLB is essentially a buffer for caching recently translated virtual page addresses along with their corresponding physical page addresses. Such an address cache works on the same principle as do caches holding data and instructions, the most recently used addresses are more likely to be used than are other addresses. Thus, if a subsequent virtual address refers to the same page as the last one, the page table lookup process is skipped to save time. A TLB entry is like a cache entry wherein a tag portion includes portions of the virtual address and the data portion includes a physical page frame number, protections fields, use bits and status bits. When provided with a virtual page address stored in the TLB (a translation hit), the TLB furnishes a physical page address for the information without having to consult any page lookup tables. When the processor requests a virtual page address not stored in the TLB (a translation miss), the MMU must then consult the page lookup tables. When this occurs, the physical page address recovered is stored along with the virtual page address in the TLB so that it is immediately available for subsequent use. This saves a substantial amount of time on the next use of the information. For example, accessing the information using a TLB may require only one or two clock cycles compared to the hundreds of clock cycles required for a page table lookup.
Virtual memory systems are common in the art. For example, in U.S. Pat. No. 5,446,854, Khalidi et al. disclose a method and apparatus for virtual to physical address translation using hashing. Similarly, Crawford et al. disclose a microprocessor architecture having segmentation mechanisms for translating virtual addresses to physical addresses in U.S. Pat. No. 5,321,836. Lastly, in U.S. Pat. Nos. 5,491,806 and 5,546,555, Horstmann, et al. disclose an optimized translation lookaside buffer for use in a virtual memory system.
As shown in FIG. 1, moving 3D graphics data to the main memory 106 in current computer systems would require the graphics accelerator 110 to access the 3D graphics data through the PCI system bus 108. Thus, although Bechtolsheim discloses a data bus enabling virtual memory data transfers in U.S. Pat. Nos. 4,937,734 and 5,121,487, 3D rendering exceeds the peak PCI bandwidth of 132 megabytes/sec because a bandwidth of at least 370 megabytes/sec is needed for data transfer from main memory 106. Moreover, the graphics accelerator 110 often requires storage of graphics data into large contiguous blocks of memory. For example, a 16-bit 256.times.256 pixel texture map for 3D graphics applications requires a memory block having a size of 128K bytes. However, operating system software, such as Microsoft.RTM. Windows.RTM., Windows.RTM. 95 and Windows NT.RTM., and the system logic 104 often allocate main memory in page frames having smaller sizes, such as 4K. In U.S. Pat. No. 5,465,337, Kong discloses a memory management unit capable of handling virtual address translations for multiple page sizes. However, this does not address the bandwidth limitations of the PCI bus discussed above. In order to move 3D graphics data from the local frame buffer 112 to main memory 106, computer systems require an improved method for storing and addressing graphics data in main memory.
In U.S. Pat. No. 5,313,577, Meinerth et al. discloses a graphics processor capable of reading from, and writing to, virtual memory. This graphics processor can be described by reference to FIG. 2, which illustrates a graphics/memory control unit 120 including a graphics processor unit 122 that communicates with a memory control unit 124. The graphics/memory control unit 120 in turn communicates with the main memory 106 and the frame buffer 112 through a dedicated memory bus 126. The graphics processor unit 122 includes an address generator and a virtual translation unit to provide for translation of virtual addresses to physical addresses when accessing the main memory 106 and the frame buffer 112. In addition, the memory control unit 124 communicates with a processor 102 through a dedicated system bus 128, with an I/O device 114 through a dedicated I/O bus 130 and with computer networks through a dedicated network bus 132. In contrast to the structure of FIG. 1, the use of dedicated buses for communication with the main memory 106, I/O devices 114 and computer networks substantially increases system cost and decreases the flexibility with which a computer system can be upgraded. For example, to upgrade the graphics capability of a computer system having the structure as illustrated in FIG. 1, one simply connects a more powerful graphics adapter to the PCI bus 108 (FIG. 1). However, upgrading the graphics capability of a computer system having the structure of FIG. 2 requires replacement of the memory control unit 124 as well as the graphics processor unit 122. Similarly, the structure of FIG. 2 is not compatible with the vast majority of available PCI enhancement devices. Moreover, the structure of FIG. 2 also requires the graphics processor unit 122 to access 3D graphics data through a memory bus 126.
In view of the limitations discussed above, computer manufacturers require a modular architecture that reduces the cost of system upgrades, such as enhanced 3D graphics adapters, to improve display performance. Similarly, to reduce system memory costs, computer manufacturers require improved methods for storing, addressing and retrieving graphics data from main memory instead of expensive local frame buffer memory. Moreover, to address the needs of high bandwidth graphics applications without substantial increases in system cost, computer manufacturers require improved technology to overcome current system bus bandwidth limitations.