1. Technical Field
The present invention relates to address translation in modern computer processors. In particular, the present invention relates to an improved method and system.
2. Prior Art
The present invention has a very general scope as its basic idea can be used in any situation when a table structured memory should be accessed quickly, and the access requires an addition of a plurality of n bit sequences, n being greater than 1, and particular 2 or 3.
A particular field of application, however, is system address translation.
Thus, the inventional aspects are set into relation with prior art in this particular field in order to be able to enlighten clearly its advantages.
Virtual memory techniques including the provision of virtual code addresses are one of the basic concepts alleviating the job of application programmers in that they need not worry about the physical locations where code could be placed in memory when the program is loaded in order to be run.
A nearly unlimited virtual address space is provided thereby for the programmer""s activities. In a process called xe2x80x98address translationxe2x80x99 such virtual addresses are transformed into physical addresses which uniquely define physical locations in the main memory at run-time.
In virtual memory, the address is broken into a virtual page address part which is the upper portion the address, and a page offset, also called byte index, the lower portion. When translated into physical memory quantities the virtual page address part is translated into a physical page address which constitutes the physical start address of the page. The page offset is not changed during translation. The number of bits in the page offset determines the page size.
The virtual page address part is broken in several parts, e.g., page index and segment index. Every part is translated through tables, e.g., page tables for the page index and segment tables for the segment index. These tables are so large that they must be stored in main memory. This means that every memory access takes at least twice as long: one memory access for every table access for the address translation and one memory access more to get the data.
The key for improving access performance is to rely on locality of reference to the page table: When a translation for a virtual page number is used, it will probably be needed again in the near future of a program run, because the references to the words on that page have both temporal and spatial locality. Accordingly, modern machines include a special cache that keeps track of recently used translations. This special address translation cache is further referred to as a translation-lookaside buffer (TLB).
Said translation process takes in general several cycles, therefore a cache array, called as well Translation Lookaside Buffer (TLB) or Translation Buffer (TB) is used, where an absolute address corresponding to a virtual address is saved after the translation process is made once.
A TLB entry is addressed by a part of the virtual address, in this example 64 bit wide virtual addresses are used for reference. The most significant bit is defined here as bit 0, the least significant bit is bit 63 (Big Endian). A prior art TLB has 128 entries, why 7 bits are needed to address the entries. For example, bits 45 to 51 are used to address the TLB. This is shown in FIG. 1.
Depending on the individual computer processor architecture the virtual address has to be generated from multiple parts to be added. In architectures used in IBM S/390 systems there are three parts called basis, index and displacement. The basis and the index are 64 bits wide, the displacement contains 12 bits. To determine the address of a TLB entry, often all 3 parts have to be added. Bits 45 to 51 of the sum are the TLB address, as shown in FIG. 1. To get the address of the TLB entry according to prior art the 19 least significant bits have to be added, because the sum of these bits include bits 45 to 51, the TLB entry address.
In prior art like IBM S/390 processor architecture a 3-port adder is used to add the basis, the index and the displacement. This has to be a 19 bit wide 3-port adder as m=3 address parts are added, because the bits needed to access the TLB are bits 45 to bit 51. These 7 bits are decoded by an address decoder to activate the corresponding word line and read the TLB entry. This is shown in FIG. 2.
The over all access time is the sum of the time needed by the 19 bit wide 3-port adder, the address decoder and the TLB access.
Said access time is quite long. Thus, it would be desirable to shorten it and to increase system performance with it.
Therefore, an object of the present invention is to provide an improved method and system for a quicker access to tables, i.e, table entries, e.g., a system table like a TLB in which the entries are addressed after adding some plurality of address parts where said plurality is 2, or 3 most commonly, but not forcedly.
The present invention is based on a first consideration to equilibrate system usage during said address translation processes and to avoid time portions where some kind of inactivity or only small activity prevails.
The second basic consideration is to avoid elements which are naturally slow in performance because of their particular way to operate. In this case, a relatively wide adder like a 19-bit 3-port adder operates quite slowly.
The key idea is to use a smaller and/or faster adder having e.g., only n=2 ports in the time critical path which leads to an ambiguous result, and to make the exact address calculation, which takes more time, during the array accesses, and to decide by a multiplexor after the TLB arrays were accessed for some kind of preselection, which of a plurality of e.g., three possible entries has to be taken.
Thus, the prior art approach is quitted which used to access a TLB after the lowest 19 bits of the virtual address have been added completely and the addition result is present. According to a preferred aspect of the present invention, when m less than n, only a short 2-port 7-bit addition is performed for being able to access the TLB. As the 7 bits do not include the lowest significant bits the TLB access is first just a preselective access in that first trial. But there is not much time lost by the 2-port 7-bit addition compared to the prior art 3-port 19-bit addition. Thus the time critical path in that total TLB access is shortened, and the remaining rest of precise address selection is moved into the phases of decode and TLB access itself.
According to the present invention those aspects mentioned above are combined and a synergizing effect is achieved.
According to the present invention the prior art drawback of using only a 7-bit address for the TLB access while having to need a 19-bit adder is avoided.
According to the present invention it is possible to achieve a faster access if only some middle level bit portion as, e.g., bits B45 to B51 are added with bits X45 to X51. Another performance increase is achieved because in this case only a 2-port adder is needed instead of a 3 port adder.
By adding only bits B45 to B51 of the basis and bits X45 to X51 of the index the right address might not be gotten, the result gets ambiguous as there may be a carry from the sum bits 52 to 63. As in this case three numbers, i.e., basis, index and displacement have to be added the carry can be 0, 1 or 2. If, e.g., 7 numbers would have to be added the possible carry values would be 0, 1, 2, 3, 4, 5, and 6.
That means, the correct result can be the 7 bit address of the first addition if the carry is 0, or it can be the 7 bit address plus 1 or 2 in the case that the carry is 1 or 2, respectively. Some examples are shown in Table 1 depicted in FIG. 4.
If the sum of the 7 bit adder of bits B45 to B51 of the basis and bits X45 to X51 of the index is equal to 0 the possible result can be 0 (carry 0), 1 (carry 1) or 2 (carry 2). If the sum is equal to 1 the possible result can be 1 (carry 0), 2 (carry 1) of 3 (carry 2). If the sum is equal to 126 the possible complete result can be 126 (carry 0), 127 (carry 1) or 0 (carry 2), because the result is wrapped around the end of the number chain 0 , . . . , 127.
Because of the ambiguous result of the 7-bit addition the three entries corresponding to the three possible addresses have to be selected. This is done advantageously by splitting the TLB array into 4 smaller arrays, the first containing the entries 0, 4, 8, . . . , 124, the second containing the entries 1, 5, 9, . . . , 125, the third containing the entries 2, 6, 10, . . . , 126 and the fourth containing the entries 3 , 7, 11, . . . , 127. This is shown in FIGS. 5 to 8. Then, by way of a multiplexor the right preselection is selected.
The reason for splitting the TLB into p=4 parts instead of 3 parts is the binary system. This leads to the possibility to split the TLB with 128 entries into 4 parts with a equal number of entries.
In conjunction with that configuration of three potential carry values and four TLB sub-blocks it should be noted that only two simple word line decoders are required instead of normally four. One is for sub-blocks 1 and 2, the other one for blocks 3 and 4.
Another main advantage is that by using only two word lines per line for four blocks the area required on the chip can be reduced. This is because every word line may have to cross all the blocks, and the array cells are not big enough for four word lines crossing each cell, but they are big enough for two word lines.
According to a preferred embodiment of a circuit according to the present invention the 2-port adder is integrated in the macro device implementing the method described above. This yields to a further increase of performance.
According to a further aspect of the present invention the inventional concept is extendible to cases, where m=n. Thus, only one advantagexe2x80x94the fact that less bits have to be added before accessing the TLBxe2x80x94can be used instead of two advantages as a 3-port adder would be necessary instead of a 2port adder.