This invention relates generally to computer systems and more specifically to memory management systems.
Memory Organization PA0 Virtual Memory Systems PA0 Page Fault PA0 Preempt PA0 Translation Function PA0 Disk Bandwidth
Computer system memory is generally divided into two levels: RAM based main memory and auxiliary memory, which is stored on a data storage medium, typically a hard disk or drum. For clarity, the term disk will be used throughout this discussion when referring to an auxiliary memory storage device, although the discussion is valid to other auxiliary memory storage devices.
The basic unit of data is the bit (for Binary digit). Groups of bits are organized into units known as bytes. A byte is generally eight bits long, although other lengths are possible. Bytes can be further organized into groups known as words, or into groups known as objects. Words are generally four, eight, or sixteen bytes in length, although other lengths are not uncommon. Objects may have no set length and may vary in length, even within a given system.
Memory, whether on disk or in RAM, is generally divided into smaller regions known as blocks, for ease of organizing and manipulating the data. When the blocks are all of the same size, they are called pages. Therefore, computer system memory is generally divided into a plurality of pages in each of which are found a plurality of words or objects. The words or objects themselves each consist of a plurality of bytes.
The terminology used to describe computer system memory suggest the obvious analogy. One can think of the system memory as a book in which are found pages upon which are found words. With a large book, a single, small word would be very difficult to locate. However, by organizing the words into pages, and referencing a word by the page in which it is contained, the word can be located with relative ease.
Although it is convenient to think of memory organization in terms of the book analogy, computer memory is quite distinct from printed information. Like a book, memory can be read from, however; memory can also be updated and written to. Furthermore, memory is dynamic--the data it contains can be moved from one location to another.
The procedure of organizing how data is distributed and moved between the main and auxiliary levels of memory is called memory management, the goal of which is to provide the greatest system speed and efficiency by maintaining in main memory that data which is most likely to be referenced by the processor. Memory management can be broken down into three procedures.
1. Page Fault: This procedure determines when a block of data needs to be moved from auxiliary memory (disk) to main memory (RAM).
2. Placement: This procedure locates unallocated regions of main memory into which the incoming data will be placed.
3. Preempt: This procedure determines which blocks will be removed from main memory and relocated to auxiliary memory, in order to free up main memory space for more incoming data.
Originally these tasks were left to the programmer, who had to keep track of what blocks of auxiliary memory would be required by the processor at what times, and issue move commands to transfer the blocks to main memory RAM at the appropriate point of the program. Needless to say, this was a complicated and tedious task, which consumed as much as forty percent of programming costs for complicated programs.
With the advent of virtual memory systems, the burden of memory management was removed from the programmer, and placed within the operating system of the computer. A virtual memory system can be best described in terms of three concepts, which are a virtual address space, a physical address space, and a translation function. The virtual address space is that set of addresses which the processor can generate. The physical address is the physical location of the memory address, whether in RAM or on disk. The translation function translates the virtual address generated by the processor to its physical location in memory, as well as translating the physical address to its associated virtual address.
FIG. 1 is a simplified block diagram of a virtual memory system. When a virtual address is issued from the processor 1, it is acted upon by the translator function of the virtual memory system 2. Two scenarios are possible when the translation function acts upon a virtual address. If the translation results in a physical address in RAM 3, the address is passed on to RAM and processing continues. If the translation results in a physical address in auxiliary memory 4, the virtual memory controller 5 issues the appropriate commands to transfer the block of memory containing this physical address to some unallocated portion of RAM 3, where it will be referenced by the processor. This entire process is `transparent` to the address generating processor. In this way, the processor can reference a very large virtual memory, `unaware` that it is in fact, being served by a relatively small RAM in which the information stored therein is constantly being changed to meet the demands of the processor. This is accomplished by the Page Fault, Placement and Preempt procedures.
When the translation from the virtual address generated by the processor to the physical address results in a physical location on disk, a page fault is generated. The purpose of the page fault is to load the desired data to RAM so that it can be referenced by the processor and processing can continue. Normally, the minimum quantum of data transfer is a page. Therefore, when it is desired to load the contents of a given physical address to RAM, the entire page in which the address is located will be loaded, hence the term page fault.
The time required to physically locate the page in the auxiliary memory and to transfer it to RAM is called paging time. Paging time involves several factors. These factors are the time to position the read/write heads to the appropriate area of the disk called the access delay; the time to rotate the disk to the appropriate sector wherein the data is contained, called the rotational delay; and the time to actually transfer the data to RAM, called the transfer time. As processing is halted until the data is transferred into RAM, paging time can be a serious bottleneck in system performance.
Prepaging is a common attempt at easing the problem of paging time. In a prepaging scheme, one or more pages located adjacent the faulted page are loaded to RAM, along with the faulted page, during the page fault. Since a major portion of paging time is due to access delay and rotational delay, several adjacent pages can be transferred along with the faulted page without significantly affecting the overall paging time. For instance, in a typical hard disk system, access delay may be sixteen ms, rotational delay, eight ms, and the transfer delay equal to one ms per K-byte (K=1024). Therefore, for a typical page size of two K-bytes, the total paging time would be twenty-six ms. If in the same system, four pages were prepaged along with the faulted page, the total paging time would be thirty-four ms. In this case five times as many pages were loaded into RAM with only a thirty percent increase in paging time.
In order for prepaging to be effective, there must be some probability that some of the adjacent pages loaded into RAM will be also be referenced by the processor. The only relationship the prepaged pages have to the faulted page is their location on the physical address space. Statistically, however, some fraction of the pages will be referenced by the processor within a reasonable time of the page fault. Since these pages have already been loaded into RAM, an additional paging time delay is not encountered when the prepaged pages are referenced.
It is important to note that the pages were located in a contiguous region on the physical address space (the disk), and that they were loaded onto a contiguous region of the virtual address space (in RAM). Therefore, the spatial relationship of these pages in both address spaces (called the spatial locality of reference) is fixed and unchanging. In a conventional virtual memory system, contiguous regions of the physical address space are always mapped to contiguous regions of the virtual address space. Also of note is the fact that the page faulted from disk, and the pages prepaged with it, now have two locations. The pages retain their original location on the disk, as well as the new locations on RAM. At some future point when the pages are preempted back to disk, they will be transferred to the same physical location.
Pages in RAM which are being referenced by the processor are said to be active. When a page has been faulted into RAM, it is as a result of a reference by the processor. Naturally such a page will be active. However, some, or perhaps all of the pages prepaged into RAM will not be referenced by the processor at all. These inactive pages occupy valuable RAM space which could be used by other pages. Additionally, eventually processing will progress to some point, or move on to some new task, where the active page will no longer continue to be referenced by the processor, and will hence become inactive.
Preemption is the process whereby inactive pages are removed from RAM and transferred back to disk in order to make room in RAM for new incoming active pages. The first step is to select which pages currently in RAM should be preempted and transferred to disk. The second step is to actually transfer the pages to disk.
A common technique to select pages for preemption is to use a clock algorithm which sequentially polls each page in RAM, and a reference bit associated with each page, in a circular fashion, just as the hands of a clock rotate around the circular time marks. The reference bit is set when the page is referenced by the processor. As the clock algorithm progresses, it checks the reference bit of each page it polls to see if the bit has been set. If the bit is set, this indicates the page has been referenced by processor since the last time it was polled. The clock algorithm clears the bit and continues. If the bit is clear this indicates that the page has not been referenced since the last rotation of the clock algorithm. The clear reference bit indicates that this page is inactive. The page can then be selected for preemption, thus meeting the goal of selecting those pages which are no longer active.
There are several variations of this algorithm known. In one instance there is a counter associated with each page, as well as the reference bit. In this scheme, when the reference bit is cleared, the counter is set to some initial value N. Each time the page is polled by the clock algorithm, the counter value is decremented by one (providing the reference bit remains clear). The page is not selected for preemption until the reference bit is clear and the counter value is zero. If at any point, the page is referenced again before it is preempted, the reference bit it set, and the counter is reset to the initial value N. In this way, a page will reside in RAM for N+1 rotations of the clock algorithm without being referenced by the processor before being selected for preemption.
After a page has been selected for preemption, the contents of the page must be transferred back to the disk. Recall, however, that the page has two locations, one in RAM, and one in disk. Therefore, it may not be necessary to actually transfer the data back to disk, but simply to change the translation function for that page such that the disk address corresponds to the page's virtual address. If however, the data within the page has been modified or updated in any way, the data must be transferred to disk, in order to store the modifications. This necessitates the need for a modified bit. The modified bit is set if the data within a page has been changed. When a page has been selected for preemption, the modified bit is checked. If the bit is clear, then the virtual memory system can simply update the page's physical address index to point to the disk address. If, however, the modified bit is set, then the page must be transferred to the auxiliary memory to retain the modifications. After the page has been transferred to auxiliary memory, if necessary, and its address has been updated the page of RAM it occupied is released to a free page pool which is used to satisfy the storage requirements of incoming pages of data.
In the preemption procedure, just as in the page fault procedure, a contiguous region of virtual address space is mapped to a contiguous region of the physical address space on the disk. This is a static process, which makes no adjustment for patterns of access or usage of the pages.
To illustrate, consider the region of memory illustrated in the FIG. 2a. In the case of a system which prepages two additional pages with each page faulted in, the page located at physical address P100 is referenced by the processor. The conventional memory system faults in page P100, i.e. it transfers it to virtual memory, to address V100. Additionally, the system will transfer pages P99 and P101 by virtue of their location in reference to P100. Note that the contiguously located pages on the physical address space are transferred to contiguously located pages on the virtual address space. Page V100 is referenced by the processor for a given period of time, whereas in this instance, pages V99 and V101 are not referenced at all. For this circumstance, the prepaging scheme did not improve system performance at all because the additional pages were not referenced. At other times, however, V99, or V101, or both may be referenced, thus driving the motivation for prepaging.
Eventually, V100 will no longer be referenced by the processor, its reference bit will be cleared by the clock algorithm, and it will be selected for preemption. Pages V99 and V101, which, in this case, were never referenced by the processor, also have clear reference bits, and may also be selected for preemption. When selected for preemption, the contiguously located virtual pages V99, V100, and V101 will be transferred back to contiguously located physical pages P99, P100, and P101. This is a static process because the relationship between pages, whether on the virtual address space or on the physical address space, is unchanging.
The shortcoming of the above described conventional systems is that they do not a take advantage of a characteristic of data usage in a system. In a typical application, certain areas of memory will be frequently referenced by the processor, at certain periods of time. These areas of memory are considered dynamically active. Additionally, there are patterns in memory activity. In other words, if during a period of time, Page X is dynamically active, and Pages Y and Z are also dynamically active, it is probable that the next time Page X is active, Pages Y and Z will also again be dynamically active. A further characteristic of memory usage is that these dynamically active pages are randomly scattered throughout the virtual address space. Refer again to FIG. 2a, in the case where the pages V100, V200, and V300 are dynamically active. First note that since pages P100, P200, and P300 are not contiguous on the physical address space, the three pages must have been loaded into RAM in three separate page faults (along with the two adjacent pages for each). Further note, that as explained above, it is highly probable that at some future time, when V100 is again dynamically active, V200 and V300 will again be dynamically active. However, with conventional memory systems, V100 will be mapped back to P100, V200 will be mapped back to P200, and V300 will be mapped back to P300 in separate preemption routines, after they have become inactive. Furthermore, if in some future point in time, the three pages again become dynamically active simultaneously (as is probable due to typical memory usage patterns) the pages will again be faulted into RAM in three separate page faults. Obviously, this static, non-adaptive approach to memory management causes a great deal of inherent inefficiency within the system.
FIG. 3 is a memory reference trace pattern of a typical application. The horizontal axis represents the virtual address space, and time progresses as one moves down the vertical axis. Each pixel represents 1 KB of virtual address space which is marked if there was any reference in the 1 KB range covered and clear if there was no reference in the 1 KB range covered. The dynamically active addresses are referred to as `hot spots.` For any given duration of time one can define a working set as that group of addresses that are dynamically active. Naturally, the size of a working set depends of the duration of time chosen--with the working set growing larger as the time duration increases.
To aid the understanding of the data in FIG. 3 we can identify three different classes of locality of reference.
First there is spatial locality of reference. This form of locality says that if a given virtual address is referenced then it is very likely that an address close to the previous address will be referenced. This is the only form of locality which can be leveraged with prepaging in a conventional paging system. In FIG. 3 high spatial locality will result in horizontal lines indicating that closely space virtual addresses have been referenced in a short period of time. Examination of FIG. 3 does not show a pattern of horizontal lines so there is not much spatial locality of reference and conventional prepaging will not yield much gain.
Second there is temporal locality of reference. Temporal locality says that if a given address is referenced then it is very likely that the same address will be referenced in the future. In FIG. 3 temporal locality of reference will result in vertical lines. Examination of FIG. 3 does show a number of vertical lines indicating a great deal of temporal locality of reference. Temporal locality of reference is the foundation for all forms of caching including the caching implied in any demand page virtual memory system.
Third there is structural locality of reference. Structural locality of reference says that if a set of addresses is referenced in a short time period and one element of the set is referenced in the future then it is very likely that other members of the set will also be referenced in a short time period in the future. In FIG. 3 the presence of structural locality of reference is indicated by patterns which repeat from time to time. Examination of FIG. 3 shows a substantial amount of structural locality. Notice the reference pattern circled at times A, B, and C in FIG. 3. The dynamically active addresses, or hot spots, in each pattern are not localized to an area of the address space, but are spread out across much of the address space. However, it is clear that the access pattern at time A is highly correlated with the access pattern at times B and C. Most of the hot spots in pattern A are repeated in pattern set B. Patterns B and C are almost identical, i.e. all the pages which are dynamically active in pattern B are also dynamically active in pattern C. However, note that there are intervening periods of time between the patterns , in which the access pattern is quite different. Therefore, some or all of the pages contained in the pattern A may be preempted to disk (in order to make room for incoming pages) before the point in time in which pattern set B is used. The way in which this preemption is carried out can have a serious impact on the system performance.
FIG. 2a can be thought of as an idealized representation of a portion of one of the patterns shown in FIG. 3. Note that contiguous regions of the virtual address space are mapped to contiguous regions of the physical address space when pages are preempted to disk in a conventional virtual memory system. This causes the scattered, non-localized pattern of the hot spots to be carried over to the disk. FIG. 2b, on the other hand, is the case for the idealized representation of a portion of one of the working sets, in which the virtual memory system of a preferred embodiment is employed. In this case the hot spots are grouped together on the physical address space when preempted to disk. In the preferred embodiment the hot spots, or dynamically active pages, are grouped together in clusters on the disk. At the future point in time when working set B is desired, the entire working set of hot spots can be faulted to RAM with just one or a few cluster faults. Contrast this to prior art systems in which each dynamically active spot of the pattern would have to be faulted into RAM separately by virtue of the scattered pattern of the addresses.
The example shown in FIG. 3 is not meant to define or delimit the scope of how the preferred embodiments can adapt to patterns of memory usage. Rather the example is simply one way in which the selective grouping of pages into clusters, as in a preferred embodiment, improves system performance.
A fundamental aspect of any virtual memory system is the process by which the virtual address is translated to the physical address, and the physical address is translated to the virtual address. It is this process by which the virtual memory system knows what page to fault into RAM in response to a virtual address reference by the processor, and also where on disk to place a page which has been preempted from the virtual memory. The process actually entails two separate functions.
The translation from virtual address to physical address can be represented as a V.fwdarw.P function. It is desirable to have the V.fwdarw.P yield a valid physical address as quickly as possible in order to avoid a possible performance bottleneck. There are a wide variety of implementations of the V.fwdarw.P function known in the art. Typically the V.fwdarw.P function is implemented as a series of look-up tables.
In a common V.fwdarw.P technique known in the art, the first order implementation of the V.fwdarw.P function is a very high speed associative memory called a translation look aside buffer (TLB). This structure contains the physical address of those pages in RAM which have been most recently referenced by the processor. The TLB contains the address for only a very few pages, which is required for quick execution; however, the page addresses contained within it, having been recently referenced, have a high probability of being referenced again.
If the TLB is not able to resolve the virtual address to a physical address contained within its index, a main memory based structure will be consulted. Typically this structure will be one of a series of look-up tables, either direct-mapped tables, or hash tables, or a tree structure. When the main memory based table is able to resolve the virtual address to a physical address located in RAM the TLB is updated to reflect this new translation and processing will continue. When the virtual to physical translation results in a physical address located on the disk, a page fault is generated as detailed in the above discussion.
A different type of translation function, known as the P.fwdarw.V function, is required when pages are to be preempted from main memory, as detailed above. When a page is selected for preemption, its physical address in RAM is known. The P.fwdarw.V tables translate this address to the virtual address currently associated with that location in order to properly track the page in the subsequent preemption operations.
Disk bandwidth, the amount of data transferred from or to the disk per unit of time, is a significant factor in overall system performance. If disk bandwidth is low the processor must repeatedly wait for needed information to be loaded to RAM. Therefore it is desirable to maximize disk bandwidth.
The bandwidth of a storage device is defined as the amount of information which can be transferred per unit of time, usually expressed as K-bytes per second (KB/s). For example, the case of a system in which page size is four K-bytes, the following paging time applies:
______________________________________ AVERAGE ACCESS DELAY 16 ms AVERAGE ROTATIONAL DELAY 8 ms TIME TO TRANSFER 4KB OF DATA 4 ms TOTAL 28 ms ______________________________________
As shown, only fourteen percent (4 ms/28 ms) of the total paging time is spent in actually transferring data. The bandwidth of this system 4 KB/28 ms or 143 KB/s.
Raw disk bandwidth is improved with pre-paging schemes. For instance, if the above system pre-paged four adjacent pages with each faulted page, the following paging time would apply:
______________________________________ AVERAGE ACCESS DELAY 16 ms AVERAGE ROTATIONAL DELAY 8 ms TIME TO TRANSFER 20K-BYTES OF DATA 20 ms (5 PAGES) TOTAL 44 ms ______________________________________
In this case forty-five percent of total paging time is spent actually transferring data, leading to a raw disk bandwidth of 454 KB/s. This appears to be a great improvement, and shows why pre-paging is frequently used.
However, the extra prepages only improve the system performance if they are used. Since the pages which can be accessed by prepaging represent pages which are contiguous on the virtual address space the probability of using the prepages is dependent on the spatial locality of reference in the access pattern. Often this form of locality is quite low. In a later section we show that the measured average number of prepages used is 0.6 for a specific test case. We also show that the best that one can reasonably expect is 1 useful prepage per page fault with typical access patterns. Since raw disk bandwidth does not directly effect system performance we define EFFECTIVE DISK BANDWIDTH as Kb used per second. In other words, RAW DISK BANDWIDTH is the amount of information transferred per second, whereas EFFECTIVE DISK BANDWIDTH is the amount of useful information transferred per second. In the example above, if the average number of prepages used is 1 then the EFFECTIVE DISK BANDWIDTH is 182 KB/SEC. This is a real improvement over the 143 KB/SEC obtained without prepaging, but much less than the raw disk bandwidth associated with prepaging.
A second issue involved is predicated on the assumption that all of the data in a page is useful data. As noted above, a page is a grouping of words, and although it is convenient to transfer and organize the memory in units of pages, the processor actually references the memory in units of words within the pages. For any given page transferred to RAM, only some of the words within that page truly represent useful data, i.e. data that the processor will actually utilize. The rest of the words in the page are simply loaded in by default, as the page is the atomic unit of transfer. The concept of the amount of useful data per page versus total data per page is called packing density. Packing density is a crucial factor in the effectiveness of utilization of the system RAM.
Packing density is significant because of internal fragmentation in computer memory data storage. For instance, in a typical application, the memory content is composed of a very large number of very small objects. A typical object size is about thirty bytes. Typical pages sizes range from 512 bytes to 8K-bytes. Consider the system discussed above, which uses 4K-byte page sizes. To load a desired object into main memory, would require 4K-bytes of scarce RAM. Were dynamically active objects localized to certain regions of the address space, packing density would not be a problem, as these localized regions would exhibit good packing. However, the dynamically active objects are generally scattered over the entire address space, leading to severe internal fragmentation, and hence, poor packing density.
Moving to a smaller page size would seem to alleviate the problems caused by the internal fragmentation of dynamically active objects, and in fact, smaller pages do indeed provide much greater system performance in terms of RAM utilization and packing density as is shown in a later section. However, conventional memory systems cannot yield any real performance improvement with small pages because of the way in which pages are transferred to and from RAM on a per page basis. Conventional memory systems transfer one page to RAM on a demand basis each time a page is faulted, with the addition of whatever pages might be prepaged in anticipation of future demand. However, the bandwidth of the auxiliary memory storage device is proportional to page size. The formula for bandwidth is:
PAGE SIZE/(ACCESS DELAY+ROTATIONAL DELAY+TRANSFER TIME)
The first two terms of the denominator are constant with respect to page size, and the third term, which is dependent upon page size, is a second order term with respect to the overall value of the denominator for small pages. Therefore, for all intents and purposes, the denominator can be considered relatively constant with page size, and hence the bandwidth will drop off with decreasing page size.
Therefore, attempts to improve the packing density factor of effective RAM utilization run afoul of the bandwidth constraint of conventional memory systems. These two opposing forces tend to offset each other such that the gain in internal fragmentation loss from small pages is offset by a loss in effective disk bandwidth such that to a first order term the overall performance of a conventional paging system is reasonably invariant with page size.
The present invention ameliorates the inherent inefficiencies of conventional virtual memory systems by the ability to detect which pages are in fact dynamically active, independent of their location on the virtual address space, and to store these pages as groups in contiguous locations on the storage device. This makes it possible to retrieve them as groups with high effective disk bandwidth when they are needed in the future. Because pages are transferred to and from RAM as groups, independent of page size, small pages can be used in order to improve packing density without a subsequent loss of disk bandwidth. The invention further adapts for changes in the pattern of frequently referenced pages as it is actively grouping and storing pages at all times during the preemption procedures.