FIG. 1 illustrates a conceptual block diagram of a PRIOR ART a processor with a typical memory architecture. In particular, FIG. 1 shows a MMU (Memory Management Unit) 12 which provides the interface between memory and a processor 10. The memory is shown below the horizontal dotted line XX as, for example, comprising a cache 14, a main memory 16 and secondary memory 18. Also, on the diagram there are two vertical arrows 20, 22. The vertical arrow 22 points upwards and indicates that as one moves up the memory hierarchy, the access speeds of the respective memories increases. That is, the CPU is able to access the cache unit 14 the quickest (i.e. with the least delay). This is normally the case because each of these memories is typically constructed from different electronic components. For example, cache memory 14 is typically SRAM (Static Random Access Memory) memory, main memory 16 is typically DRAM (Dynamic) memory and secondary memory 18 is usually provided by disc storage.
The purpose of such a memory is to optimise the execution of a program to be run on the processor 10. The system will try to organise itself, using for example well-known LRU (Least Recently Used) replacement algorithms, so that the code and data to be accessed most frequently is stored in the memory areas having faster access speeds. The code or data used most often will be stored in the cache unit, while the code or data to be accessed least often (or perhaps not at all) is stored in secondary memory 18.
At least one purpose of the MMU 12 is to translate from virtual addresses issued by the processor 10 into physical addresses which correspond to actual memory locations of data or code stored in one of the physical memory areas. Virtual addressing advantageously allows a logical contiguous address space to be created for the processor, but each logical or virtual memory allocation is actually represented physically by particular physical memory addresses, which need not be contiguous.
FIG. 2 illustrates a conceptual block diagram of a PRIOR ART mapping of contiguous virtual addresses to non-contiguous physical addresses within a general system employing virtual addressing. FIG. 2 shows a virtual memory 2 having a number of logical contiguous memory addresses running from address 0 to the address 2n−1 where n is the number of bits used. For example in a 32-bit virtual addressing system, the size of virtual memory can be 4 GB (i.e. 232). The virtual memory locations need to point to actual physical memory locations where physical data or code, to be accessed, is stored. The dotted vertical line YY shows the virtual addressing system on the left hand side in which each virtual address corresponds to an actual physical memory address shown on the right hand side of the dotted line YY.
FIG. 2 shows for example, that the physical address space comprises main memory 16 and secondary memory 18. FIG. 2 also shows that virtual addresses can be referred to in a contiguous manner, whereas the actual physical locations to be accessed are non-contiguous and can also be found in different physical devices of the physical memory. Therefore, in a virtual addressing system, the MMU 12 is able to map virtual addresses into physical addresses and to maintain a database for the storage of such mappings.
A technique often used in computing systems of this type is “paging”, in which the physical memory is divided into fixed sized blocks known as pages. Although the virtual memory is also divided into pages which are mapped to corresponding physical page addresses, it is the selection of the page size of the physical address which is important to the designer of the system as will be explained later.
FIG. 3 illustrates a PRIOR ART example of mapping a virtual page address to a physical page address. FIG. 3 should be referred to in combination with FIG. 1 in that the processor 10 issues a virtual address 30 which is translated by the MMU 12 into a physical address 32. For example, a 32-bit virtual address 30 is used which comprises a VPN (Virtual Page Number) portion 34 and a PO (Page Offset) portion 36. The PO 36 is merely an index into the identified page for selecting a particular memory location and in the example is shown as taking bits 0 to 11 of the virtual address. The VPN takes up the reminder of the 32-bit address by taking bits 12 to 31. The MMU 12 has circuitry for translating the VPN portion 34 of the virtual address into a PPN (Physical Page Number) portion 39 of the physical address. Typically, the PO portion 38 of the physical address 32 is not translated and retains the same value as the PO portion 36 of the virtual address 30.
The physical address 32 is shown as having bits 0 to k where k≦31. The number of PPN's (and hence the size of k) will depend on the size of the selected page. For a small page size of 4 Kb there are potentially 1024 PPN's (or physical page addresses to choose from) in a 4 GB system (i.e. where n=32). In this case, k=31 since 20 bits are needed to represent 1024 PPN's (i.e. 220=1024). However, if a large page size of 256 MB is chosen, then there are potentially only 16 PPN's in such a system. In this case, k=15 since only 4 bits are needed to represent 16 PPN's.
FIG. 4 illustrates a PRIOR ART page mapping structure of a TLB (Translation Look-aside Buffer) 40. The basic structure of TLB 40 is used by the MMU 12 for mapping the VPN 34 into the PPN 39. It should be appreciated that if the page size of the physical memory is selected to be small, there will be a larger number of PPN's than if the page size were chosen to be large. Therefore, the selection of a page size is a trade-off depending on what is required from the system and the designer's requirements.
The translation look-aside buffer can be implemented using a small area of associative memory within a processor. A data structure for achieving this is a page-table as shown in FIG. 4.
The translations can be too large to store efficiently on the processor. Instead, it is preferable to store only a few elements of the translation in the TLB so that the processor can access the TLB quickly. It is desirable that the TLB is not too big and does not have too many entries so that precious processing overhead is not spent searching through a large number of entries.
Speculative load operations are the same as normal load operations, except speculative loads can be executed out of sequence and will return the same data as normal loads except that when a normal load would cause an exception to be raised, a speculative load will instead return an invalid number.
The unpleasant property of speculative loads is that effectively they can be considered as an instruction that can generate a read from anywhere in the address space. This is problematic, because either i) there is no device or physical memory mapped at that address resulting in a bus error or ii) a device or memory area that is mapped is read sensitive so that the speculative load will destroy the device state. Either of these scenarios is potentially disastrous.
One solution uses a valid bit 33 to overcome this problem, in that when this bit was set speculative loads from this page address always returned a zero. If a small page size (for example 8 KB) is chosen then for a 64-entry TLB only 512 KB of physical memory will be mapped. Thus, potentially in a 4 GB virtual address space there will be many areas of the physical memory which are not mapped and therefore many TLB misses. A TLB miss is termed a “page fault” and will be serviced in the normal way.
Therefore, many modern systems prefer large page sizes in that there are only a limited number of PPN's to be mapped and therefore the number of TLB entries is reduced along with the processing overhead needed to access the corresponding entry. However, this approach has serious disadvantages for big page sizes both for use in a small real-time operating system and a large multitasking operating system, for example Linux.
In the case of a small real-time operating system, it is highly desirable to avoid the overhead of having to service page faults, which means one wants to use big pages. However, consider a 16 MB RAM, if the system uses a page size larger than 16 MB, the system loses its fine grain control and everything above 16 MB will be a hole, where speculative loads could cause a problem because there is no RAM there to read from. This disadvantage become even more apparent for a RAM size, which is not a power of 2, for example 112 MB.
In the case of a large multitasking operating system such as Linux, the problem is essentially the same. That is, the kernel of the operating system in this case likes to see all of the RAM without having to take page faults. So ideally all of RAM is mapped at a high virtual address page with a single TLB entry which is fixed (i.e. it will never be replaced). However, again the same disadvantage is that for a large page size it will not be able to achieve the finer grain control needed to identify whether a speculative load is valid.