1. Field of the Invention
This invention relates to processors and computer systems, and more particularly to address translation mechanisms used within computer systems and processors.
2. Description of the Related Art
A typical computer system includes a processor which reads and executes instructions of software programs stored within a memory system. In order to maximize the performance of the processor, the memory system must supply the instructions to the processor such that the processor never waits for needed instructions. There are many different types of memory from which the memory system may be formed, and the cost associated with each type of memory is typically directly proportional to the speed of the memory. Most modern computer systems employ multiple types of memory. Smaller amounts of faster (and more expensive) memory are positioned closer to the processor, and larger amounts of slower (and less expensive) memory are positioned farther from the processor. By keeping the smaller amounts of faster memory filled with instructions (and data) needed by the processor, the speed of the memory system approaches that of the faster memory, while the cost of the memory system approaches that of the less expensive memory.
Most modern computer systems also employ a memory management technique called xe2x80x9cvirtualxe2x80x9d memory which allocates memory to software programs upon request. This automatic memory allocation effectively hides the memory hierarchy described above, making the many different types of memory within a typical memory system (e.g., random access memory, magnetic hard disk storage, etc.) appear as one large memory. Virtual memory also provides for isolation between different programs by allocating different physical memory locations to different programs running concurrently.
Early x86 (e.g., 8086/88) processors used a segmented addressing scheme in which a 16-bit segment value is combined with a 16-bit offset value to form a 20-bit physical address. In a shift-and-add operation, the 16-bit segment portion of the address is first shifted left four bit positions to form a segment base address. The 16-bit offset portion is then added to the segment base address, producing the 20-bit physical address. In the early x86 processors, when the shift-and-add operation resulted in a physical address having a value greater than FFFFFh, the physical address value xe2x80x9cwrapped aroundxe2x80x9d and started at 00000h. Programmers developing software for the early x86 processors began to rely upon this address wrap-around xe2x80x9cfeaturexe2x80x9d. In order to facilitate software compatibility, later x86 processors included an address bit 20 xe2x80x9cmaskingxe2x80x9d feature controlled by an xe2x80x9cA20Mxe2x80x9d input pin. By asserting an A20M signal coupled to the A20M pin, address bit 20 is produced having a logic value of xe2x80x9c0xe2x80x9d. As a result, address values greater than FFFFFh appear to wrap around and start at 00000h, emulating the behavior of the early x86 processors.
Many modem processors, including x86 processors, support a form of virtual memory called xe2x80x9cpagingxe2x80x9d. Paging divides a physical address space, defined by the number of address signals generated by the processor, into fixed-sized blocks of contiguous memory called xe2x80x9cpagesxe2x80x9d. If paging is enabled, a xe2x80x9cvirtualxe2x80x9d address is translated or xe2x80x9cmappedxe2x80x9d to a physical address. For example, in an x86 processor with paging enabled, a paging unit within the processor translates a xe2x80x9clinearxe2x80x9d address produced by a segmentation unit to a physical address. If an accessed page is not located within the main memory unit, paging support constructs (e.g., operating system software) load the accessed page from secondary memory (e.g., magnetic disk) into main memory. In x86 processors, two different tables stored within the main memory unit, namely a page directory and a page table, are used to store information needed by the paging unit to perform the linear-to-physical address translations.
Accesses to the main memory unit require relatively large amounts of time. In order to reduce the number of required main memory unit accesses to retrieve information from the page directory and page table, a small cache memory system called a translation lookaside buffer (TLB) is typically used to store the most recently used address translations. As the amount of time required to access an address translation in the TLB is relatively small, overall processor performance is increased as needed address translations are often found in the readily accessible TLB.
A typical modem processor includes a cache memory unit coupled between an execution unit and a bus interface unit. The execution unit executes software instructions. The cache memory unit includes a relatively small amount of memory which can be accessed very quickly. The cache memory unit is used to store instructions and data (i.e. data items) recently used by the execution unit, along with data items which have a high probability of being needed by the execution unit in the near future. Searched first, the cache memory unit makes needed information readily available to the execution unit. When needed information is not found in the cache memory unit, the bus interface unit is used to fetch the needed information from a main memory unit located external to the processor. The overall performance of the processor is improved when needed information is often found within the cache memory unit, eliminating the need for time-consuming accesses to the main memory unit.
FIG. 1 is a block diagram illustrating an address translation mechanism of an exemplary modem x86 computer system. A cache unit 10 within an x86 processor may be used to store instructions and/or data (i.e., data items) recently used or likely to be needed by an execution unit coupled to cache unit 10. Cache unit 10 includes a TLB 12 used to store the most recently used address translations, a multiplexer 14, and gating logic 16.
TLB 12 receives a linear address provided to cache unit 10 and produces a stored physical address corresponding to the linear address. Multiplexer 14 receives the linear address provided to cache unit 10 and the physical address produced by TLB 12. Multiplexer 14 produces either the physical address or the linear address dependent upon a PAGING signal. When paging is disabled, the linear address provided to cache unit 10 is a physical address, and address translation by TLB 12 is unnecessary. In this case, the PAGING signal is deasserted, and multiplexer 14 produces the linear address. When paging is enabled, the linear address provided to cache unit 10 is a virtual address, and translation of the virtual address to a physical address is necessary. In this case, the PAGING signal is asserted, and multiplexer 14 produces the physical address produced by TLB 12. If a stored physical address corresponding to the linear address is found within TLB 12, TLB 12 asserts a TLB HIT signal. Otherwise, the TLB hit signal is deasserted.
Gating logic 16 receives address bit 20 (i.e., signal A20) of the physical address produced by multiplexer 14, and the A20M signal. Gating logic 16 produces a new signal A20 dependent upon the A20M signal. When the A20M signal is deasserted, gating logic produces the new signal A20 such that the new signal A20 has the same value as the signal A20 of the physical address produced by multiplexer 14. In other words, when signal A20M is deasserted, gating logic xe2x80x9cpassesxe2x80x9d the signal A20 of the physical address produced by multiplexer 14. On the other hand, when the A20M signal is asserted, gating logic produces the new signal A20 with a logic value of xe2x80x9c0xe2x80x9d. In other words, when signal A20M is asserted, gating logic xe2x80x9cmasksxe2x80x9d or xe2x80x9cclearsxe2x80x9d the signal A20 of the physical address produced by multiplexer 14.
In addition to TLB 12, cache unit 10 includes a cache memory 18 for storing the data items recently used or likely to be needed by the execution unit coupled to cache unit 10. Cache memory 14 includes a tag array 20 for storing physical address xe2x80x9ctagsxe2x80x9d,and a data array 22 for storing the data items. Each data item stored in data array 22 has a corresponding physical address xe2x80x9ctagxe2x80x9d stored in tag array 20.
When the linear address is provided to TLB 12, a least-significant or lower ordered xe2x80x9cindexxe2x80x9d portion of the linear address is simultaneously provided to tag array 20 and data array 22 of cache memory 18. In the embodiment of FIG. 1, cache memory 18 is a two-way set associative cache structure. The index portion of the linear address is used as an index into tag array 20. As a result, tag array 20 produces two physical address xe2x80x9ctagsxe2x80x9d. One of the two physical address xe2x80x9ctagsxe2x80x9d is provided to a comparator (CO) 24a, and the other physical address xe2x80x9ctagxe2x80x9d is provided to a comparator 24b. The index portion of the linear address is also used as an index into data array 22. As a result, data array 22 produces two data items. The two data items are provided to different inputs of a multiplexor (MUX) 26.
After passing through multiplexer 14 and gating logic 16, the physical address is provided to comparators 24a -b. If the physical address matches one of the physical address xe2x80x9ctagsxe2x80x9d provided by tag array 20, the corresponding comparator 24 asserts an output signal. The output signals produced by comparators 24a-b are provided to a control unit 28 which controls the operations of cache unit 10. The output signal produced by comparator 24b is also provided to a control input of multiplexor 26. Multiplexer 26 produces an output DATA signal in response to the output signal produced by comparator 24b. The output DATA signal may include the data item from data array 22 corresponding to the physical address xe2x80x9ctagxe2x80x9d which matches the physical address provided to comparators 24a-b. Control unit 82 uses the TLB HIT signal and the output signals produced by comparators 24a-b to determine when the DATA signal produced by multiplexor 80 is xe2x80x9cvalidxe2x80x9d. When the DATA signal produced by multiplexor 80 is valid, control unit 82 asserts an output DATA VALID signal. Control unit 82 also produces an output CACHE HIT signal which is asserted when the data item corresponding to the provided linear address was found in cache memory 18.
Cache unit 10 is coupled to a bus interface unit (BIU) 30 within the x86 processor, and BIU 30 is coupled to a main memory 32 located external to the x86 processor. When the PAGING signal is asserted and TLB 12 does not contain the physical address corresponding to the linear address (i.e., the TLB HIT signal is deasserted), control unit 28 provides the linear address (i.e., virtual address) to BIU 30. BIU 30 may include address translation circuitry to perform the virtual-to-physical address translation. The address translation circuitry within BIU 30 may access virtual memory system information (e.g., the page directory and the page table) stored within main memory 32 in order to perform the virtual-to-physical address translation. BIU 30 may provide the resulting physical address to control unit 28, and control unit 28 may provide the physical address to TLB 12. TLB 12 may store the linear address (i.e., virtual address) and corresponding physical address, assert the TLB HIT signal, and provide the physical address to comparators 24a-b. 
If the physical address does not match one of the physical address xe2x80x9ctagsxe2x80x9d provided by tag array 20, control unit 28 may submit a read request to BIU 30, providing the physical address to BIU 30. BIU 30 may then read the data item from main memory 32, and forward the data item directly to cache memory 18 as indicated in FIG. 1. Cache memory 18 may store the physical address within tag array 20, and store the corresponding data item retrieved from main memory 32 within data array 22. Cache memory 18 may also forward the stored physical address to either comparator 24a or 24b, and forward the stored data item to an input of multiplexor 26. As a result, the comparator to which the stored physical address is provided asserts the output signal, multiplexor 26 produces the DATA signal including the stored data item, and control unit 28 asserts the CACHE HIT signal.
Multiplexer 14 and gating logic 16 exist along a critical speed path within cache unit 10, and thus limit the maximum speed at which cache unit 10 may operate. It would thus be desirable to have a processor including a cache unit which does not include multiplexer 14 and gating logic 16 coupled as shown in FIG. 1 such that the operational speed of the cache unit may be increased.
The problems outlined above are in large part solved by a computer system implementing a novel address translation mechanism. The computer system includes a processor which executes instructions. The present processor includes a cache unit coupled to a bus interface unit (BIU). Address signal selection and masking functions are performed by circuitry within the BIU rather than within the cache unit, and physical addresses produced by the BIU are stored within the TLB. As a result, address signal selection and masking circuitry (e.g., a multiplexer and gating logic) are eliminated from a critical speed path within the cache unit, allowing the operational speed of the cache unit of the present processor to be increased.
The cache unit stores data items, and produces a data item corresponding to a received linear address. The cache unit includes a translation lookaside buffer (TLB) for storing multiple linear addresses and corresponding physical addresses. When a physical address corresponding to the received linear address is not found within the TLB, the cache unit passes the linear address to the BIU. The BIU returns the physical address corresponding to the linear address to the cache unit. The linear address includes multiple linear address signals, and the physical address includes multiple physical address signals.
The BIU includes address translation circuitry, a multiplexer, and gating logic. The address translation circuitry receives the multiple linear address signals and produces multiple physical address signals from the multiple linear address signals. The multiplexer receives the multiple linear and physical address signals and a paging signal, wherein the paging signal may be asserted when a paged addressing mode is enabled. When the paging signal is deasserted, the multiplexer may produce the linear address signals as physical address signals at an output. On the other hand, the multiplexer may produce the multiple physical address signals at the output when the paging signal is asserted.
The gating logic receives one or more of the physical address signals produced by the multiplexer. The gating logic either passes the one or more physical address signals or masks the one or more physical address signals dependent upon a first masking signal. When the first masking signal is deasserted, the gating logic may produce the one or more physical address signals unchanged at an output. On the other hand, the gating logic may produce constant logic value signals (e.g., logic xe2x80x9c0xe2x80x9d signals) in place of the one or more physical address signals at the output when the first masking signal is asserted, thus masking the one or more physical address signals when the first masking signal is asserted. The BIU may provide the physical address signals acted upon by the gating logic to the cache unit as the physical address corresponding to the linear address. The cache unit may store the physical address and the linear address within the TLB.
The present processor may also include a microexecution unit and a programmable control register. The control register may include a masking bit and a paging bit. The first masking signal may be a value of the masking bit, and the paging signal may be a value of the paging bit. The microexecution unit may receive a second masking signal generated external to the processor. Upon detecting a change in state of the second masking signal from an old state to a new state (e.g., a transition from a logic low or xe2x80x9c0xe2x80x9d voltage level to a logic high or xe2x80x9c1xe2x80x9d voltage level), the microexecution unit may: (i) flush the contents of the TLB, and (ii) modify the value of the masking bit within the control register to reflect the new state of the second masking signal. Such actions may be delayed after detecting the change in state of the second masking signal to allow a certain number of instructions (e.g., 2) to be executed in the context of the old state of the second masking signal before the masking bit within the control register is changed.
The BIU may receive the paging signal (i.e., the value of the paging bit) from the control register. As described above, the paging signal may be asserted when the paged addressing mode is enabled. When the paging signal is asserted, the multiple linear address signals may form a virtual address. The address translation circuitry within the BIU may produce the multiple physical address signals from the multiple linear address signals when the paging signal is asserted. In other words, the address translation circuitry may perform a virtual-to-physical address translation when the paging signal is asserted.
The BIU may be coupled to a main memory located external to the processor. The main memory may be used to store virtual memory system information (e.g., a page directory and a page table). The address translation circuitry may use the virtual memory system information stored within the main memory to produce the multiple physical address signals.
The present processor implements a novel address translation method. This method may include providing a translation lookaside buffer (TLB) for storing multiple linear addresses and corresponding physical addresses. Upon detecting a change in state of a masking signal (e.g., the externally generated second masking signal described above) from the old state to the new state, the TLB may be flushed, and the new state of the masking signal may be saved. When a linear address is not found within the TLB, a physical address including multiple physical address signals may be produced from the linear address. One or more of the physical address signals may be masked dependent upon the saved state of the second masking signal. The linear address and the physical address may then be saved within the TLB.
A computer system is described which includes the present processor. The computer system may also include a bus coupled to the processor, and a peripheral device coupled to the bus. The bus may be a peripheral component interconnect (PCI) bus, and the peripheral device may be, for example, a network interface card, a video accelerator, an audio card, a hard disk drive, or a floppy disk drive. Alternately, the bus may be an extended industry standard architecture (EISA)/industry standard architecture (ISA) bus, and the peripheral device may be, for example, a modem, a sound card, or a data acquisition card.