1. Field of the Invention
This invention relates generally to memory management in computer systems and, more particularly, to methods for managing concurrent access to virtual memory data if structures.
2. Description of the Related Art
Modern computer systems employ operating systems to manage the computer systems"" resources and provide a foundation for application programs running on the computer systems. Some of the popular operating systems include DOS, Microsoft Windows(copyright), Microsoft Windows NT(copyright), Microsoft Windows 98(trademark), UNIX, and LINUX(trademark). The operating system provides a base for writing and running application programs thereby freeing programmers from the details of computer system hardware. In addition, the operating system manages processes, memory, file systems, I/O systems, and the like.
In an operating system, a process refers to a running program with input, output, and a state. For example, a process includes the current values of the program counter, the registers, and the variables of an executing program. Each process has a thread, which is associated with an address space. The thread is sometimes referred to as a lightweight process. Processes and threads are well known in the art and are described, for example, in Modern Operating Systems, Andrew S. Tannenbaum, (1992). Hence, running a process generally requires executing a thread by accessing the address space.
The operation of accessing an address space typically involves managing a memory system in the operating system. In particular, the operating system implements a virtual memory system to map a virtual address associated with a thread from a large virtual address space to a physical address of a physical memory, which is typically a RAM. A computer system is not limited to a single virtual address space. Indeed, it may implement as many virtual address spaces as its operating system is capable of supporting. For example, modern operating systems often support multiple processors and multiple threads of execution, thereby allowing the sharing of the system resources and further providing multiple concurrent processes and threads that execute simultaneously.
FIG. 1A illustrates an exemplary conventional memory mapping method for mapping one or more virtual address spaces to a physical memory. A plurality of virtual address spaces 102 (VAS0), 104 (VAS1), and 106 (VASN) are provided. Each of the virtual address spaces 102, 104, and 106 is provided with a page table for mapping. Specifically, the virtual address spaces 102, 104, and 106 are associated with page tables 110, 112, and 114, respectively. Each of the virtual address spaces has a plurality of virtual pages 116. A physical memory 108 also includes a plurality of physical pages 118. The virtual pages 116 and physical pages 118 are typically of same size and typically range from 4 kilobytes (KB) up to 16 KB. Nevertheless, computer systems may employ any suitable page size, which can be selected by the operating system based on supporting hardware.
In this configuration, pages in the virtual address spaces 102, 104, and 106 are mapped to pages in the physical memory 108 via page tables 110, 112, and 114, respectively. For example, a virtual page 120 in the virtual address space 102 is mapped via page table 110 to physical page 126. Likewise, a virtual page 122 in the virtual address space 104 is mapped to physical page 128 through page table 112 while virtual page 124 of the virtual address space 106 is mapped to physical page 130 via page table 114. In those instances where a page is not present in the physical memory, a page fault is generated to load the page from a secondary storage device such as a hard drive, optical drive, tape drive, etc. Page mapping and page faults are well known in the art. It should be noted that page tables may be shared among several virtual address spaces. Indeed, even a portion of a page table may be shared among different address spaces.
A virtual address space, in abstract terms, is typically divided into a plurality of regions in accordance with data types. FIG. 1B shows a more detailed diagram of the exemplary virtual address space 102. The virtual address space 102 is comprised of a plurality of regions 130, 132, 134, 136, 138, and 140. Each of the regions 130 through 140 is a contiguous region and the virtual pages within each region share common attributes. For example, the regions 130, 134, and 138 are empty regions that can be used to accommodate new data (e.g., files) from a secondary storage device or data from other contiguous regions 132, 136, and 140. The code region 132 corresponds to the address space of codes (e.g., text in Unix) such as programs, instructions, and the like. On the other hand, the data region 136 includes a pair of sub-regions 142 and 144 that corresponds to address spaces of data and uninitialized data (e.g., HEAP), respectively. Likewise, the stack region 140 corresponds to the address space of a stack. The operating system maintains attributes such as the start address and the length of each region so that each region can be tracked accurately.
As mentioned above, the virtual pages in each region share common attributes. For example, the code region 132 may have an attribute specifying a file on a hard drive from which instructions can be fetched. The stack region 140, on the other hand, usually grows dynamically and automatically downwards toward lower addresses and has an attribute that identifies it as a stack. Other common attributes include read and write attributes. For instance, the code region 132 is generally given an attribute of read only while data is associated with both read and write attributes. Other attributes also may be applied to any of the regions in a virtual address space.
In modern computer systems, operating systems generally allow multiple threads to execute virtually simultaneously in the virtual address space 102. For example, UNIX and LINUX(trademark) operating systems allow multiple threads to concurrently execute in a single virtual address space. In such instances, the threads may be performing an operation that affects the address space at once. For example, multiple threads on multiple CPUs could simultaneously perform page faults. Multiple threads may also execute a system call (e.g., MMAP in Unix) to map a file from a secondary storage device into the address space. To accommodate the new file, the operating system may create a region in one of the empty regions 130, 134, or 138 of the virtual address space 102.
However, when multiple threads are attempting to access the same region in a virtual address space, a problem of contention arises. For example, if two threads are allowed to operate on the kernel data associated with the same virtual page in a region, the data may not be synchronized or updated properly. To address the contention problem, conventional techniques have used a xe2x80x9clockxe2x80x9d to synchronize access by providing exclusive access to a thread such that other threads are not allowed to change the data accessed by the thread. In this manner, the lock ensures mutual exclusion of multiple threads for updates.
Conventional methods typically have provided a lock for each region in a virtual address space. The virtual memory system portion of the operating system generally maintains the regions of a virtual address space as a data structure, which is kept in a memory. FIG. 1C shows a simplified data structure 150 using locks 162, 164, and 166 to provide exclusive access to regions 152, 154, and 156, respectively. The regions 152, 154, and 156 may correspond to a code region, data region, and stack region, respectively, and may be shared among different address spaces. It is noted that the word region is used herein in its most general form. In fact, it may actually be composed of multiple data structures within the kernel. The data structure 150 also includes an address space 158 that heads the virtual address space and maintains a pointer to the first region 152. In addition, the address space 158 includes a pointer to a page table 160 associated with the data structure 150. The data structure 150 may be provided for each virtual address space where the operating system provides multiple virtual address spaces. The data structures for all the virtual address spaces are stored in kernel memory in the operating system.
The regions 152, 154, and 156 are arranged as a linked list where the region 152 points to regions 154, which in turn points to region 156. However, the data structure 150 may be implemented by using any suitable arrangement such as arrays, trees, and the like. Each of regions 152, 154, and 156 is also a data structure and provides a pointer to locations such as files on a disk, flags for read/write permission, a flag for a stack, etc.
The data structures for the regions 152, 154, and 156 include the locks 162, 164, and 166, respectively. The lock 162 is used to provide a thread with exclusive access to the kernel data structures for the pages in the region 152. For example, the lock 162 is obtained and held to enable the thread to perform an operation that affects the kernel data structures corresponding to the virtual addresses in the region 152. When the thread finishes its operation, the lock 162 is released so that another thread can access the data structures. Similarly, the locks 164 and 166 are used to provide exclusive access to the data structures for the pages the regions 154 and 156, respectively. As is well known in the art, the locks 162, 164, and 166 may be implemented using binary semaphore, monitor, etc.
Unfortunately, the conventional method of providing one lock per region to protect against changes to the region data structures has several drawbacks. For example, providing a single lock in a region creates a contention problem and a bottleneck when multiple threads need to perform page faults or make other changes to the state of the pages in the region. In such situations, the threads are typically placed in a queue and executed one after another, thereby causing the bottleneck. The problem is exacerbated in proportion to the scale of a computer system. As an example, a large-scale computer system with dozens or even hundreds of processors may have hundreds of threads competing for the lock to a region, which can be gigabytes in size.
One solution has divided each region into multiple sub-regions, with each sub-region being assigned a lock. Although this solution somewhat improves the performance, it merely provides finer granularity without substantially correcting the fundamental problem stemming from having one lock per region or sub-region. That is, the contention problem will continue to exist for a sub-region having a plurality of pages unless the granularity of each sub-region is equal to the size of a virtual page.
Making the granularity of the sub-regions equal to the size of virtual page, however, leads to other problems. By way of example, a two-gigabyte region of a virtual address space can be split four contiguous sub-regions of 512 megabytes each. Each sub-region will have its own lock. Although four threads may execute concurrently, they must be accessing different sub-regions. In other words, they may not access the data structures for the same sub-region at the same time. Furthermore, splitting a multi-gigabyte region into small pieces would result in a prohibitive number of regions. For instance, using four sub-regions effectively quadruples space and memory overhead to support the sub-regions. Since each region is represented by a data structure, creating multiple sub-regions in place of a single one increases the operating system kernel""s memory overhead. Hence, this conventional approach increases both the time and space overhead of the kernel.
These problems are further accentuated in multi-processor computer systems having many processors due to the use of a large number of threads. For example, in distributed shared memory (DSM) systems, providing multiple locks for a region still suffers from the space overhead problem and also suffers from a lack of locality. Furthermore, the conventional methods do not scale easily for various region sizes and require substantial memory spaces to accommodate the data structures.
In view of the foregoing, what is needed is a method for managing concurrent access to virtual memory data structures without the attendant cost in space. What is also needed is a method that can provide locks that are scalable for multi-processor computer systems.
Broadly speaking, the present invention fills these needs by providing methods for managing concurrent access to the kernel data structures for a virtual page in memory. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium. Several inventive embodiments of the present invention are described below.
The present invention provides methods for providing concurrent access to a virtual page data structure in a computer system. A lock bit for locking a virtual page data structure is provided in a page table entry of a page table. The page table is configured to map virtual pages to physical pages. Then, a first thread specifying an operation on the virtual page data structure is received. The first thread is provided exclusive access to the virtual page data structure by setting the lock bit in the page table entry such that other threads are prevented from accessing the virtual page data structure.
Preferably, a wait bit also is provided in the page table entry to indicate that one or more of the other threads are in a wait queue when the first thread has exclusive access to the page. When the first thread no longer needs exclusive access to the page, a second thread is selected from among the other threads and is provided with exclusive access to the page. Alternatively, the waiting threads may be placed in a spin loop to wait for the lock bit to become available. In this case, the wait bit need not be used.
By thus providing a lock in each page table entry, the present invention allows the locks to scale on a one-to-one basis with page table entries and therefore with virtual pages as well. Furthermore, the methods of the present invention employ a single lock bit in each of the existing page table entries, thereby substantially reducing space requirements and eliminating the need for additional data structures. The use of a wait bit in the page table entry facilitates efficient locking when more than one thread is waiting to access a data structure associated with the page table entry. Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.