The central subsystem of a computer generally comprises three types of units: processors, memory modules forming the main memory, and input-output controllers. Usually the processors communicate with the memory modules through a bus that allows addressing and transfer of data between the processors and the main memory. To execute a program instruction, its operands must be located in the main memory. The same is true for successive instructions of the program to be executed. In the case of a multiprocessing system, the memory must be partitioned to allow multiplexing between programs. For this purpose, virtual addressing is generally used in conjunction with a pagination mechanism which includes dividing the addressable space, or "virtual space," into zones of a fixed size called "pages." In a system of this kind, a program being executed addresses a virtual space that corresponds to a real portion of the main memory. Thus, a logical or virtual address is associated with a physical or real address of the main memory.
An instruction that requires addressing contains information that enables the processor executing it to obtain a real address. In general, this virtual address is segmented, i.e., it is composed of a segment number, a page number, and a shift i.e., displacement within the referenced page. The segment number in turn can be subdivided into a segment table number and a shift within this table.
In order to access an item of information in memory associated with this segmented address, several memory accesses are necessary. It is first necessary to access an address space table allocated to the process (program being executed). From this table, using a segment table number, the real address of the corresponding segment table is then obtained. Next, as a function of the shift in the segment table, a segment descriptor is accessed which makes it possible to calculate the real address of a page table. Eventually a real page address is found, using a function of the page number defining the shift within this page table, thereby making it possible to address the memory. The real address of a word or particular byte [8-bit byte] is obtained by concatenation of the real page address with the shift within this page defined by the least significant bits of the virtual address.
A memory access is relatively time consuming, due primarily to the use of a bus that is common to both the processors and the memory module. To improve system performance, the successive memory accesses typically required to obtain a real address are avoided as much as possible. Most processes have a locality property according to which, during a given phase of its execution, the number of pages used by a process is very small relative to the total number of pages allocated to that process.
The locality property can be used to facilitate the translation of a virtual address into a real address by the use of "extracts". Each extract includes a virtual address and an associated real address, and is used by the program during a single execution phase. A plurality of extracts are stored in high-speed memory or in registers. To perform a translation of a virtual address into a real address, a high-speed associative memory is accessed to determine whether the virtual address to be translated is already present in the high-speed memory. If it is, the real address is obtained directly without accessing the main memory.
The locality property motivates the use of cache memories composed of small, high-speed memories in which the pages most recently referenced are stored. The probability that a new reference will relate to an item of information already present in cache memory is high, so the effective access time is reduced. In a manner analogous to the translation of the virtual address into a real address, a cache memory comprises a table containing the real addresses of the pages present in the cache memory. This table, called a directory, can be consulted in an associative fashion to determine whether the information associated with a given real address is contained in the cache memory. If it is, a word or byte is obtained by addressing the cache memory by means of the least significant bits of the virtual address of the word or octet.
Issues related to the translation of addresses will now be discussed, it being understood that the same considerations may apply to cache memories. In both cases, the issue is that of rapidly obtaining information associated with a page address. In the case of translation of an address, the page address is a virtual address and the associated information is the corresponding real address, while in the case of cache memory, the page address is a real address and the associated information includes all the data contained in the page.
As previously discussed, the high-speed translation memory is an associative memory. The memory comprises a given number of registers, or more generally, locations, each capable of storing one extract. Each extract can be accompanied by additional information such as right-of-access indicators or indicators reporting that a write access has been effected in the page associated with the extract. Moreover, each extract is associated with a presence indicator which, for a given logic state, indicates that the associated extract is valid. These presence indicators are, for example, set to zero at initialization, i.e., each time a process is activated in a particular processor. Thus, as the process uses new pages, the associated extracts are loaded into associative memory and the respective presence indicators are simultaneously set to 1. When a memory access must be executed, the virtual address is compared with the virtual address of each extract stored in associative memory. If there is a match between the virtual address being sought and one of the virtual addresses of an extract stored in the memory while its associated presence indicator is set to 1, the corresponding real address can be obtained directly by simply reading the register that contains the real address.
In order for this translation mechanism to be practical, the associative memory must be of limited size. Consequently, for processes with many pages, the associative memory cannot store all the extracts corresponding to these pages. When associative memory is full, the only way to store a new extract therein is to erase an old extract. It is therefore necessary to provide a method for eradicating an old extract and storing a new extract in its place. To accomplish this, a replacement algorithm is used that decides which old extract is to be replaced by a new extract. Many algorithms have already been proposed, for example:
the FIFO ("first in, first out") algorithm, in which the oldest extract is replaced; PA1 the RAND ("random choice") algorithm, in which the extract to be replaced is chosen at random; PA1 the LFU ("least frequently used") algorithm, in which the least frequently used extract is replaced; and PA1 the LRU ("least recently used") algorithm, in which the least recently used extract is replaced. The LRU algorithm theoretically gives good results, but in practice it is preferable to use a simplified version, called the "pseudo-LRU." To generate n extracts, a true LRU requires the presence and management of log.sub.2 (n) bits per extract to maintain an ordered history of the uses of the extracts. On the other hand, a pseudo-LRU requires only a single bit per extract, called a reference bit or indicator.
According to the pseudo-LRU algorithm, the reference bit is set to a first logic state (1 for example) when its associated extract is used. When the associative memory is full, all the presence indicators are set to 1, and a new extract must be loaded. The extract to be replaced is the first extract encountered with a reference bit set to 0 according to the chronological order of filling. When saturation is reached, i.e., when all but one of the reference bits are set at 1, the extract whose reference bit is at 0 is replaced by the new extract, and all the reference bits are then reset to 0. Resetting all the reference bits obliterates the history of the use of the extracts.
The loss of the history upon saturation results in a decrease in the efficiency of the algorithm. Moreover, a freshly loaded extract may very likely be reused immediately after being loaded, again resulting in saturation upon the loading of the fresh extract.
To assess the consequences of this loss of history, it is useful to perform simulations of populations consisting of specific processes relating to applications normally handled by the system. It is found, for example, that for a certain population of processes, in 65% of the cases, only 32 extracts are required. This means that 65% of the processes do not require use of the replacement algorithm. On the contrary; the algorithm will be used at least once by 35% of the processes. This means that a loss of history will occur at least once in 35% of the cases.
During a simulation, it has also been found in 90% of the cases, that at a given moment, a program will access one of the last seven extracts called. This result confirms the phenomenon of locality mentioned above. Thus, after the reference bits are reset to 0 after saturation, the only criterion for selecting the extract to be replaced is the position of the extract in the associative memory. However, the location of an extract provides no indication of the time of its last use; it only provides an indication of its first use. Thus, it is quite possible that pages used at the start of the process will be used again a short time before saturation. In this case, after resetting the reference bits to 0, these pages will be eliminated first, even though they have a high probability of being reused very soon thereafter.