1. Field of the Invention
The invention relates generally to address translation, and more specifically to a computer system employing multiple bus masters which share a common address translation unit and paging mechanism.
2. Description of Related Art
The following background information and definitions are provided as a basic level set for understanding the underlying principles for application of the present invention and is not meant to limit the present invention to any of the specific examples set forth herein.
An "operating system" is an underlying supervisory program which runs on a computer system, typically transparent to the user, for handling routine tasks such as memory management.
A "page" is a minimum size block of information in a virtual memory system which is swapped between primary memory (e.g. RAM) and a secondary memory (e.g. disk drive), typically under the control of the operating system.
An "exception" is an abnormal or uncommon event occurring in the computer system intended to signal it to perform a task, typically to service the exception.
A "page fault" is a particular type of exception which is generated when a requested virtual address is not in primary memory.
A "translation lookaside buffer" (TLB) is a particular type of cache that stores physical addresses of the most recently used pages.
"Thrashing" is an undesirable event that occurs when a page is frequently swapped between primary and secondary memories.
So-called virtual memory addressing is a well known technique whereby large address spaces are emulated with relatively small, but fast memory (such as RAM) and relatively large, but relatively slow secondary memory (such as a disk drive). Historically, virtual memory addressing was performed because fast memory was far too cost prohibitive to support large address spaces. The most common way of implementing virtual memory addressing in a computer system is by swapping minimum sized blocks of information known as pages into and out of RAM from a disk drive under the control of the operating system when a requested address is not located within the primary or physical memory.
The requested address (referred to as the "virtual", "linear", or "logical" address) is either translated to a corresponding physical address in RAM or causes an exception to be generated known as a so-called page fault if the required address falls in a memory location defined by a page not currently in RAM. When the operating system services the page fault, typically some form of "least recently used" (LRU) technique is used to expunge that page from RAM, load the required page, and restart the address request. In the x86 architecture, the page size is typically, although not necessarily, fixed at four kilobytes and aligned on four kilobyte boundaries.
Referring now to FIG. 1, a two-level page table addressing technique is described in the context of the x86 architecture. Bits 31-22 of a thirty-two bit linear address are used to locate an entry in a so-called directory table 10. The directory table 10 is a master index for up to one thousand twenty four individual second-level page tables. The selected entry in the directory table 10, referred to as the directory table entry (DTE), identifies the starting or "base" address 11 of a second-level page table 12. The directory table 10 is typically four kilobytes in size, holding one thousand twenty four--four-byte DTEs, and is itself a page and therefore aligned to a four kilobyte boundary. Each DTE has twenty bits which define the page table (base) address and twelve bits which define attributes, some of which are unused, described in more detail hereinbelow. The base address 13 of the directory table 10 is stored in a page directory base register (PDBR) 14.
Bits 21-12 of the thirty-two bit linear address offset the base address 11, to locate a thirty-two bit entry, referred to as the page table entry (PTE), in the second-level page table 12. The page table 12 addresses up to one thousand twenty four individual page frames and is four kilobytes in size, holding one thousand twenty four--four-byte PTEs, and is itself a page aligned to a four kilobyte boundary. Each PTE has twenty bits which define a desired page frame within physical memory (RAM) 16 and twelve bits which define attributes, some of which are unused, described in more detail hereinbelow.
Bits 11-0 of the thirty-two bit linear address, referred to as the page. offset, locate the desired physical memory address within the page frame 15 pointed to by the PTE. Since the directory table 10 can point to one thousand twenty four page tables, and each page table can point to one thousand twenty four of page frames, a total of 1,048,576 page frames are realized. Since each page frame 15 contains four kilobytes of physical memory addresses (a page offset of twelve bits), up to four gigabytes of virtual memory can be addressed. The directory table 10 and page table 12 may reside wholly or in part, in cache or external memory.
Reference is now made to FIG. 2 which illustrates in more detail, the DTE and PTE of FIG. 1. Each DTE and PTE contains a twenty bit base address (bits 31-12) of either the page table 12 or the page frame 15 respectively, as well as twelve other attribute bits (bits 11-0). A present bit (P) (bit 0) is set in the DTE to indicate that the requested page table 12 is present and therefore the appropriate PTE can be read. The P bit is also set in the corresponding PTE to indicate that the page is in physical memory 16. Accessed (A) and dirty (D) bits, bits 5 and 6 respectively, are updated upon a hit, if necessary, and the information is fetched from physical memory 16. Accessed (A) bits in both the DTE and the PTE are set, if necessary, to indicate that the directory table 10 and page table 12 have been used to translate a linear address. The dirty (D) bit in the DTE and PTE are set before the first write is made to a page. Both present bits are set to validate the remaining bits in the DTE and PTE. If either of the present bits are not set, a page fault is generated when either the DTE or PTE is accessed. If the P bit is not set, the remaining DTE/PTE bits are available for use by the operating system, for example, to record the address on the hard disk where the page is located. A page fault is also generated if the memory reference violates the page protection attributes set in bits 1-4. The details of these attribute bits are not necessary for the understanding of the present invention but are mentioned for completeness. A more complete explanation can be found in the CX486DX/DX2 data book, order number 94113-01, from Cyrix Corporation, Richardson, Tex., herein incorporated by reference.
The two-level table access described above is sometimes referred to as "tablewalking". Tablewalking is time intensive because for a two level page table, it requires at least three memory cycles namely; one for fetching the DTE, one for fetching the PTE, and one for reading or writing the requested address in physical memory 16. Frequently, this access latency can be avoided with the use of a translation lookaside buffer (TLB) 18. The TLB 18 contains "tags" (i.e. copies of the most recently accessed linear addresses) along with their corresponding physical addresses. The TLB 18 replaces tablewalking, thus reduces memory cycles, when a desired linear address matches "hits" with one of the tags stored within it. Accordingly, the TLB 18 can immediately map the linear address to the physical address without doing a tablewalk.
By way of further background, in a multiple bus master computer system, physical memory 16 may be accessed by devices other than the processor, including DMA devices, micro-controllers, as well as by other processors (e.g. symmetric multiprocessing). In the x86 architecture, signals BOFF, HOLD, and HLDA, described in the CX486DX/DX2 data book, order number 94113-01, from Cyrix Corporation, Richardson, Tex., which was herein incorporated by reference, provide an adequate protocol for bus arbitration for current x86 architectures but do not address virtual to physical memory translation.
Specifically, BOFF (back-off) is asserted by system (chipset) logic to force the processor to abort a current bus cycle, and relinquish control of the local processor bus in the next clock cycle. Once BOFF is de-asserted, the processor restarts any aborted bus cycle in its entirety. HOLD (bus hold request) is slightly different from BOFF and is asserted by chipset logic to indicate that a DMA device requests control of the local processor bus to run a DMA access to memory. Unlike BOFF, the processor completes the current bus cycle and then acknowledges the request and relinquishes control of the local processor bus. HLDA (hold acknowledge) is asserted by the processor in response to HOLD (after the current bus cycle is completed) indicating that it has relinquished control of the local bus for a DMA access. When chipset logic de-asserts HOLD, the processor de-asserts HLDA.
In addition to the above mentioned bus arbitration signals, current x86 architecture computer systems use cache coherency signals AHOLD and EADS to support caching on the processor. AHOLD (address hold request) is asserted by chipset logic to cause the processor to tri-state the address lines of the local bus one clock after AHOLD while still completing the current bus cycle. A DMA device performs a cache inquiry cycle by driving an address into the processor at the same time it is presented to memory. The processor does not initiate another bus cycle except for a snoop write-back cycle resulting from the cache inquiry.
It is contemplated that sophisticated bus master devices will be added onto the local processor bus which will require address translation such as, but not limited to, virtual to physical address translation. An exemplary but not exhaustive example of this is a rendering processor that performs draws, fills, and bitblt operations to main memory (as well as to its local video memory) or shared memory--thus requiring virtual to physical address translation. In this regard, up until now address translation was accomplished locally for each bus master--requiring a separate address translation unit or software routine for each bus master.
By way of further background, high speed, dedicated graphics ports are emerging that support memory bandwidth intensive applications such as 3D rendering. An exemplary, but not exclusive scheme for this can be found in the Accelerated Graphics Port Interface Specification, Revision 1.0, dated Jul. 31,1996, from Intel Corporation of Santa Clara, Calif., said specification herein incorporated by reference. In this scheme, a graphics accelerator is coupled through a dedicated accelerated graphics port (a.k.a. AGP) to chipset logic which, among other things, arbitrates accesses to system memory. Additionally, the AGP scheme requires the chipset logic to include a so-alled Graphics Address Re-mapping Table (a.k.a. GART) mechanism separate and distinct from the virtual to physical address translation mechanism, so that the graphics accelerator perceives a contiguous view (memory address-wise) of graphics data structures in dynamically allocated system memory.
Ostensibly, the disadvantage of using multiple address translation units (including a GART in the chipset logic) is redundancy of hardware along with its attendant space consuming and power consumption issues, as well as the additional burden on the operating system to initialize and maintain coherency among multiple address translation units.
Accordingly, it can be seen from the foregoing, that there is a need to provide a shared address translation unit for use in a multiple bus master computer system, thus reducing size, cost, and complexity.