1. Field of the Invention
The present invention relates to the field of operating systems, and in particular to a method for managing free physical memory pages that reduces the thrashing, caused by unfavorable mapping of virtual to physical addresses, thereby increasing the effectiveness of cache memory.
2. Description of the Related Art
It is common in the art for an operating system to divide memory into pages of a specified length. For example, an operating system may divide memory into 4K pages. FIG. 1 illustrates how both physical memory and virtual memory are divided into 4K pages. Physical memory 100 is divided into a number of 4K pages. Physical memory is usually implemented with a number of RAM or DRAM chips. Physical memory 100 is addressed by physical memory addresses typically beginning with zero (0000) and extending to some memory address that corresponds to the size of the physical memory.
Virtual memory 120 includes a number of applications 124, 126, and 128. At any one time, the computer processor may be running any number of applications such as a word processing program, a spreadsheet program, and a window management program. It appears to each of these applications as if that application has access to a number of virtual memory pages typically beginning with a memory address zero and ending at some memory address that corresponds to the size of the application. Each application may include a number of virtual memory pages. For example application 124 includes 3 virtual memory pages 125, 127, and 129.
The operating system maps each of the virtual memory pages onto a physical memory page. For example, an operating system would translate the virtual memory pages 125, 127 and 129 in an application to their respective physical memory pages 105, 106, and 107.
Although it appears to each application that its memory pages begin at memory address zero, and such pages are consecutive, in fact, the operating system maps these virtual memory pages into physical memory pages that are not necessarily contiguous or consecutive. In fact, a virtual page 130 that appears to an application 126 to begin at address zero may in fact be mapped by the operating system into a physical memory page that begins at some non-zero address.
At any one time, the physical memory may include a number of free pages 109. These free pages 109 are pages in physical memory that are not being used by any of the applications running on the computer.
In the prior art, these free pages are organized as a linked list having a variable that points to the beginning of the list. Whenever an application required an additional physical page of physical memory, the operating system would map the virtual memory page onto that free physical memory page. The choice of which free page to allocate to the virtual page was "random" in the sense that the first free page was always allocated. The word, "random", denotes the lack of any purposeful correlation between the virtual and physical memory addresses during the mapping process. The word, "random", is not used to comment on or guarantee the actual distribution of the free pages. Similarly, when an application is finished with a physical page of memory, that free page is appended to the end of the linked list in a random and non-deterministic order.
Referring to FIG. 1, an operating system divides memory into 4K pages. Physical-memory 100 is partitioned into 4K pages (i.e., 105, 103 and 108). Similarly, virtual memory is partitioned into 4K pages. Various applications 124, 126, and 128 reside in virtual memory. As can be seen in FIG. 1, the operating system kernel maps the various pages in virtual memory of different applications (124, 126 and 128) to pages in the physical memory. Although it appears to application 124 that it resides in three consecutive and contiguous pages beginning with page 125, beginning at address of zero, in fact, the 3 pages in virtual memory of application 124 may map anywhere into the physical memory.
In general, an operating system usually defines the structure of a virtual address as having two fields. A plurality of bits are set aside to identify the virtual page number. Also, a plurality of bits are set aside to identify the byte offset from the beginning of that particular page. An application running an operating system (assuming a 32 bit memory address and 4K pages) may use a virtual addresses of the form: 0.times.VVVVVSSS, where VVVVV (bits 31:12) identify the virtual page number, and SSS (bits 11:0) are the byte offset of the address within the page.
Referring to FIG. 2A, the page number is represented by bits 31 through 12, and the page offset is represented by bits 11:0. Please note that each V and S represents four bits in the address. The kernel of the operating system in conjunction with the virtual memory facilities maps the virtual address, 0.times.VVVVVSSS, into a physical address of the form: 0.times.QQQQQSSS. This physical address includes QQQQQ (bits 31:12) that identify the physical page number; and SSS (bits 11:0) that identify the offset of the address within the page.
A computer system often includes a cache. An operating system supporting a cache specifies and defines several fields within the physical memory address used to access the cache (FIG. 2B). Customarily, each memory address, addressing a cache location includes a tag field having a plurality of bits that will be compared with the tag bits of a cache directory, which are stored at the same set address. When matching tags are found by the cache controller, the data within the cache at the same set address represents the contents of the main memory location being accessed. Another common field in the physical address is the set address field also commonly known as the index field, which is used to select the set (i.e., the row within the cache). Also, a block-offset field, which is used to select the desired data from that block, (i.e., this is also known as the subline field) is also frequently used.
In one embodiment of the present invention the physical address includes a tag field, a set address field, and a block-offset (i.e., subline) field. Referring to FIG. 2B, Bits 31:17 represent the address tag field. Bits 16:5 represent the cache set address field, and bits 4:0 represent an align offset field. When the kernel maps the virtual address into the physical address, the lower twelve bits represented by SSS remain unchanged. In other words, the page offset in the virtual address maps directly to bits 11:0 of the physical address.
When the kernel assigns physical addresses to virtual addresses, cache set address bits 11:5 are guaranteed to be unchanged since bits 11:0 are identical in the physical and virtual addresses. However, an important observation is that bits 16:12 of the set address are subject to other arbitrary changes as they map from a virtual to a physical address depending on the kernel's memory page management scheme.
The arbitrary mapping of address bits 16:12 makes it impossible for user-level tools, such as a compiler which optimizes code, to predictably reduce cache conflicts for arrays and other objects that are greater than 4K in size. In other words, it is impossible for an optimizing compiler to modify the program (i.e., the layout of the data in the program) in order to reduce the cache conflicts, which occur during run time. In an architecture with 4k page sizes, any data structure smaller than 4K in size can be placed in a single page. Thus, data objects less than a page in size would never conflict with itself in a cache. As mentioned previously, any data conflicts would occur between pages.
In a computer system which uses a small cache, the number of rows could easily be addressed by bits 11:0. In other words, if the set address field, which indexes the rows in a cache, is small due to the small size of the cache, and the set address field on the block offset field together only use twelve bits (11:0) or fewer, then the problem of the arbitrary mapping of address bits from virtual to physical addresses would not exist. In other words, a one-to-one correspondence is automatically enforced because the page offset bits 11:0 pass through unchanged from the virtual to the physical address.
However, as cache size increases, there comes a point where to properly address a cache, the number of bits in the set address field and block offset field together may exceed the number of bits in the page offset in the memory address. An arbitrary/random mapping of these additional address bits beyond the number of bits in the page offset increases thrashing and reduces the effective size of a cache.
The effect of this random allocation of free physical memory pages becomes pronounced in systems which use cache memories to decrease average latency to access data. When a cache is implemented into the computer system to improve the performance of the computer system, this random allocation of free physical memory pages adversely decreases the effective size of the cache.
As a general rule, the larger the cache memory implemented in a computer system, the greater the increase in the performance and throughput. However, when the cache can hold more than one physical memory page, then the effect of the "random" page management (i.e., a non-deterministic virtual to physical memory address mapping) causes the under-utilization of the cache memory, thereby reducing the realized benefit from the cache memory.
For example, when there is a particular physical memory page that is heavily accessed (read and/or written) we refer to that as a "hot" page. A "hot" page is a page which is accessed much more often in some period of time than the majority of other pages which are accessed. A "hot" physical memory page will naturally by cached piece by piece (i.e., a cache line at a time) into the cache. This cache activity results in a copy of the entire page exiting in the cache. Likewise, if there is a second "hot" physical memory page, this physical memory page will also get coped into the cache. It is important to keep in mind that caches typically cache things much smaller than a page (e.g., a cache line may be 32 bytes in size, and a page may be 4096 bytes). When there are no conflicts in allocating cache lines, two "hot" pages could co-exist in cache simultaneously having been copied to the cache one line at a time.
When a cache memory can hold no more than one physical memory page, the problems of under-utilization and conflict over blocks in the cache are non-existent because the small size of the cache precludes these two problems. However, when the cache is large enough to hold many physical memory pages, these two problems can occur.
First, under-utilization of the cache memory is a persistent problem. Since the free physical memory pages are managed in a "random" way, there will be spaces in the cache which are used very frequently, whereas other areas in the cache are used less frequently or not at all. Moreover, the utilization of cache memory will vary each time the application is run because the free physical pages are randomly allocated.
Not only are the run times inconsistent because of this non-deterministic effect, injected by this random virtual to physical memory address mapping, but the effective size of the cache is also reduced by some random non-deterministic amount. For example, it is reasonably certain that on any particular run, certain spaces within the cache are used very frequently while others are not accessed frequently or may not be used at all.
A second problem is best illustrated with an example. Returning to the previous example where there are two or more "hot" memory pages, the cache ideally would hold a copy of each "hot" page in the cache memory. This results from the fact that a "hot" physical memory page will naturally be cached piece by piece (i.e., a cache line at a time) into the cache. This cache activity results in a copy of the entire page existing in the cache, assuming that no conflicts occur while allocating all the cache lines within the cache for that page.
For the first "hot" page, there is a 100% probability that a space in a cache memory will be found. The reason for this is that none of the cache memory is being used. Thus, over time, the first hot page will reside completely in a contiguous area within the cache memory.
The second hot page has a probability of (N-1)/N.times.100% in finding an area inside the cache memory without accessing the same location that is used by the first "hot" page, assuming a direct mapped cache. Set associative caches have a corresponding problem. However, the decrease in probability is much more difficult to describe mathematically. Nonetheless, the probability correspondingly decreases as the number of hot pages increases. N simply represents the number of rows in a cache. It is evident that in either case, direct mapped cache or set associative cache, as the number of hot pages increases, the probability of the next hot page finding a space within cache memory not used by any of the other hot pages decreases dramatically.
With a random memory page management allocation scheme, the chance for a hot page to be loaded into an area in cache memory in which resides another hot page increases with the number of "hot" pages used.
This effect of displacing one hot page with another hot page is commonly referred to as thrashing. The effect of thrashing is that additional loads and stores to memory are required to access data in both of those hot pages in cache memory. As the number of loads and stores (the number of accesses to main memory) increases, the traffic on the memory bus increases, thereby decreasing throughput of the computer system.