The present invention relates to operating systems for multi-processor computer systems and more particularly to a system and method for an input/output (I/O) operation of a multi-processor computer system, more specifically a system and method for an I/O operation in which a system wide global PFD lock is not a requisite to performing the I/O operation.
Many current computer systems employ a multi-processor configuration that includes two or more processing units interconnected by a bus system and each being capable of independent or cooperative operation. Such a multi-processor configuration increases the total system processing capability and allows the concurrent execution of multiple related or separate tasks by assigning each task to one or more processors. Such systems also typically include a plurality of mass storage units, such as disk drive devices to provide adequate storage capacity for the number of task executing on the systems.
One type of multi-processor computer system embodies a symmetric multiprocessing (SMP) computer architecture which is well known in the art as overcoming the limitations of single or uni-processors in terms of processing speed and transaction throughput, among other things. Typical, commercially available SMP systems are generally xe2x80x9cshared memoryxe2x80x9d systems, characterized in that multiple processors on a bus, or a plurality of busses, share a single global memory or shared memory. In shared memory multiprocessors, all memory is uniformly accessible to each processor, which simplifies the task of dynamic load distribution. Processing of complex tasks can be distributed among various processors in the multiprocessor system while data used in the processing is substantially equally available to each of the processors undertaking any portion of the complex task. Similarly, programmers writing code for typical shared memory SMP systems do not need to be concerned with issues of data partitioning, as each of the processors has access to and shares the same, consistent global memory.
There is shown in FIG. 3 a block diagram of an exemplary multiprocessor system that implements a SMP architecture. For further details regarding this system, reference shall be made to U.S. Ser. No. 09/309,012, filed Sep. 3, 1999, the teachings of which are incorporated herein by reference.
Another computer architecture known in the art for use in a multi-processor environment is the Non-Uniform Memory Access (NUMA) architecture or the Cache Coherent Non-Uniform Memory Access (CCNUMA) architecture, which are known in the art as being an extension of SMP but which supplants SMPs xe2x80x9cshared memory architecture.xe2x80x9d NUMA and CCNUMA architectures are typically characterized as having distributed global memory. Generally, NUMA/CCNUMA machines consist of a number of processing nodes connected through a high bandwidth, low latency interconnection network. The processing nodes are each comprised of one or more high-performance processors, -associated cache, and a portion of a global shared memory. Each node or group of processors has near and far memory, near memory being resident on the same physical circuit board, directly accessible to the node""s processors through a local bus, and far memory being resident on other nodes and being accessible over a main system interconnect or backbone. Cache coherence, i.e. the consistency and integrity of shared data stored in multiple caches, is typically maintained by a directory-based, write-invalidate cache coherency protocol, as known in the art. To determine the status of caches, each processing node typically has a directory memory corresponding to its respective portion of the shared physical memory. For each line or discrete addressable block of memory, the directory memory stores an indication of remote nodes that are caching that same line.
There is shown in FIG. 4 a high level block diagram of another exemplary multiprocessor system but which implements a CCNUMA architecture. For further details regarding this system, reference shall be made to U.S. Pat. No. 5,887,146, the teachings of which are incorporated herein by reference.
Almost all modern computer systems, including multi-processor system employing the above-described computer architectures, use Virtual Memory, in which addressable ranges of computer memory may or may not be present in, or connected to, equivalent physical memory at any given time. It is the responsibility of the operating system, or computer hardware, to ensure that the Virtual Memory be mapped, or connected, to corresponding physical memory when the Virtual Memory is being accessed by either an executing processor or other device (usually an I/O device). Virtual Memory that is not currently being accessed is free to be xe2x80x9cunmappedxe2x80x9d, or disconnected from its corresponding physical memory. The contents of the Virtual Memory that is unmapped or disconnected are generally maintained in a so-called xe2x80x9cbacking storexe2x80x9d (usually a disk drive).
Because of the asynchronous nature of I/O operations, it is vitally important that the Virtual Memory involved with the I/O operation is not unmapped in the time between the initiation of an I/O operation (such as a read, or input, from a disk device), and the completion of the same I/O operation (such as the actual storing into memory of the retrieved disk device data). Thus, it is necessary for Virtual Memory pages involved in an I/O operation, to be xe2x80x9clockedxe2x80x9d, xe2x80x9cwiredxe2x80x9d, or xe2x80x9cpinnedxe2x80x9d, to their corresponding physical memory during the entire time that data is moving between the physical memory and the I/O device. In this way, the pinned pages of physical memory cannot be unmapped during the I/O operation.
There are two techniques that have been implemented for ensuring the proper mapping of virtual memory to physical memory for I/O operations. The first and simplest method, and the technique generally being implemented, in general terms involves individually querying the status of each and every physical memory page participating in the I/O operation, ensuring it is xe2x80x9cpaged inxe2x80x9d, xe2x80x9cmarked as lockedxe2x80x9d, xe2x80x9cpinnedxe2x80x9d, or xe2x80x9cwiredxe2x80x9d, processing the I/O operation, individually unlocking each physical memory page, and exiting the I/O operation. This solution is expensive and inefficient in that it typically requires the operating system to obtain exclusive access to the physical memory control structures through a broad-based system lock, such as a spinlock.
More specifically, there is shown in FIG. 1A a flow diagram of the high level process for implementing an I/O operation according to the first technique in a computer system employing two or more processors, where the physical memory can be a shared global memory as with SMP architectures or a distributed global memory as with NUMA/CCNUMA architectures. In accordance with this technique, when a program or process running on one or more processors requires information, for example, to be read to the physical memory from a storage device, written to a storage device from the physical memory or outputted from the physical memory to a communications device or network, an I/O request is made by the processor to the operating system. Pursuant to this request, the operating system initiates an I/O operation, Step 100. As discussed hereinafter, the initiation of the I/O operation can be delayed because of other ongoing activities.
Once the I/O operation is initiated, the operating system proceeds to ensure that the physical memory pages are pinned, STEP 102. As is known in the art, physical memory is typically subdivided into pages, these pages can be mapped or pinned individually so as to correspond to pages or addresses of the Virtual Memory for a given application. This pinning process ensures that the page frame database (PFD) for the physical memory is arranged so as to lock each page of physical memory corresponding to the virtual memory pages or addresses to be read/inputted to or written/outputted from. The PFD covers or describes all of the physical memory of a given machine or computer system regardless of where it is in the machine (e.g., centralized global memory or distributed memory as with CCNUMA architectures). Once a page of physical memory is marked as locked or pinned in the PFD, the operating system cannot re-map the physical memory page(s) as a Virtual Memory page for another application until that I/O operation in process is completed, and the physical memory page is marked as unpinned or unlocked.
Now referring also to FIG. 1B, there is shown a process for the xe2x80x9cpinning xe2x80x9d of the physical memory pages under Step 102. The pinning process is initiated by having the operating system first obtain a system-wide global PFD lock, STEP 200. This system wide lock locks the PFD so the mapping or re-mapping of physical memory pages as addresses or pages of Virtual Memory cannot be done for any other application other than the one involved with I/O operation while the PFD is being modified. If there are many I/O operations being initiated simultaneously, acquisition of the PFD lock becomes costly in terms of wasted CPU cycles due to typical bus and cache contention associated with global system locks, typically spinlocks. Consequently, any processes or programs attempting to initiate I/O operations will be pended and delayed until the existing I/O operation releases the system wide global PFD lock.
After locking the PFD, the next page (Virtual Memory page) in the IO buffer is identified, Step 202. When the process is initiated the next page is the first page in the IO buffer. After identifying the next page, the operating system references the page of Virtual Memory so as to ensure mapping, Step 204. This process step also generally includes initiating a process, if needed, to bring the page contents in from backing store. This retrieval of data or information (i.e., virtual memory page contents) from the backing store makes the existing I/O operation overall a time intensive operation because two I/O operations, including two pinning operations, are in effect required. One I/O operation is required to retrieve the page contents from the backing store and the other operation is the I/O operation that had been requested and initiated by the program or process.
After ensuring the mapping between the virtual memory pages and physical memory pages, the operating system updates the PFD so as to lock the appropriate physical memory pages, Step 206. This generally means that a reference counter in the PFD, used to indicate the locked or unlocked status of each page, is incremented to indicate that the associated page is locked or pinned so that the contents cannot be swapped or the physical page remapped during the I/O operation. Typically any nonzero reference counter value indicates that the associated page is locked or pinned. After locking the physical memory page, the operating system determines if this is the last page in the IO buffer, Step 208. If it is not the last page (NO, Step 208), then the operating system identifies the next page in the IO buffer, step 202 and repeats the above-described process, steps 202-208.
If this is the last page in the IO buffer (YES, Step 208) then the operating system releases the system wide global PFD lock, Step 210, thereby unlocking the PFD. This is a precursor to the transfer of data or information. At this point the memory mapping or unmapping functions for another I/O operation (i.e., an operation waiting to perform steps 102 or 106) can proceed, while the existing I/O operation proceeds as follows.
Referring back to FIG. 1A, following the unlocking of the PFD and the release of the system wide global PFD lock, the operating system sends the appropriate signals to the appropriate system hardware so the data or information is transferred to/from the pinned physical memory pages, Step 104. In other words the actual I/O operation, namely the requested read/write operation, is performed. For example, the operating system sends the appropriate signals to an adapter of the computer system so that the data is transferred to/from a disk device such as a hard disk, floppy disc or CD. While the I/O operation is being performed, control is returned to the processor involved with the I/O request so that it can do other things.
As is known in the art, after the data/information transfer is complete, the system hardware (e.g., adapter) outputs a signal(s) to the operating system indicating that the transfer process is complete. After receiving this signal(s), the operating system un-pins the physical memory pages, Step 106. The unlocked pages are thus now free to be remapped as required by the operating system.
Now referring also to FIG. 1C, there is shown a process for the xe2x80x9cun-pinningxe2x80x9d of the physical memory pages under Step 106. The un-pinning of the physical memory pages is similar to that described above for the pinning of the physical memory pages, and thus the following is limited to a brief description of the common steps. As such, reference shall be made to the foregoing discussion for Steps 200-202 and Steps 208-210 for further details regarding 250-252 and Steps 256-258 referred to hereinafter.
The un-pinning process is initiated by having the operating system again obtain a system wide global PFD lock, Step 250, to lock the PFD. After locking the PFD, the next page in the IO buffer is identified, Step 252 and after identifying the next page, the operating system updates the PFD so as to un-lock the appropriate physical memory pages, Step 254. This generally means that the locked/unlocked page reference counter in the PFD is decrmented so the reference counter value is zero, so as to indicate that the associated physical memory page is un-locked or un-pinned such that the contents can be swapped or the physical page remapped at a later time.
After un-locking the physical memory page, the operating system determines if this is the last page in the IO buffer, Step 256. If it is not the last page (NO, Step 256) then the operating system identifies the next page in the IO buffer, step 252 and repeats the above-described process for steps 252-256. If this is the last page in the IO buffer (YES, Step 256) then the operating system releases the system wide global PFD lock, Step 258.
Referring now back to FIG. 1A, following the unlocking of the PFD and the release of the system wide global PFD lock, the existing I/O operation is completed, Step 110. At this point the operating system is again available to perform the memory mapping or unmapping functions for another I/O operation (i.e., an operation waiting to perform steps 102 or 106). At the same time control also is returned to the initiator so that the applications program or process involved with the just completed I/O operation can proceed with and/or be available to perform the next task.
Although this particular technique is simple, acquiring the PFD lock for memory mapping, unmapping, and locking becomes increasingly more time consuming as more concurrent I/O operations are initiated and therefore contend for the PFD lock. As also indicated above, while the memory mapping, unmapping, and locking functions of a given I/O operation are being performed other I/O operation requests cannot perform their respective memory mapping, unmapping, and locking functions and thus the applications programs/processors involved with such other I/O operation requests are unable to proceed (i.e., pended or delayed).
If one or more applications programs being run on a multi-processor system require or involve frequent I/O access to disk devices and/or communication devices (e.g., for example a database or transaction processing application), then the I/O requests of the multi-processors can become in competition with each other. As a consequence, the competing I/O operations can cause the processing of one or more I/O operations to be delayed. Consequently, the time to perform a task by an applications program in the standby mode is in effect increased. The second technique is essentially the same as the above-described method and has been thus incorporated into the above.
It thus would be desirable to provide new methods, operating systems and multi-processor computer systems that would allow data/information to be transferred to/from the physical memory without having to employ a system wide global PFD lock and which takes advantage of advances in software programming and the wide availability of inexpensive memory. It would be particularly desirable to provide such methods and operating systems that operate at the processor level, in stead of the system level, to verify mapping of the Virtual Memory and pinning of the physical memory prior to initiating the data/information transfer of an I/O operation. It also would be particularly desirable to provide such methods and operating systems, which simplify the cleaning-up (i.e., un-pinning) process following the completion of a data transfer. Further, it would be desirable to provide such methods and devices that reduce the amount of time to perform an I/O operation in comparison to prior art methods and systems.
The instant invention is most clearly understood with reference to the following definitions:
A computer readable medium shall be understood to mean any article of manufacture that contains data that can be read by a computer or a carrier wave signal carrying data that can be read by a computer. Such computer readable media includes but is not limited to magnetic media, such as a floppy disk, a flexible disk, a hard disk, reel-to-reel tape, cartridge tape, cassette tape or cards; optical media such as CD-ROM and writeable compact disc; magneto-optical media in disc, tape or card form; paper media, such as punched cards and paper tape; or on carrier wave signal received through a network, wireless network or modem, including radio-frequency signals and infrared signals.
The present invention features methods and applications programs for reducing the overhead associated with system I/O in a computer system employing multiple processors and with either a global physical memory or a distributed physical memory. Such methods and applications programs advantageously reduce the amount of contention between the I/O operations of the various processors, in particular contention for the system wide global PFD lock, so as to improve the effective processing time overall for the applications programs being run on the computer system when at least one of such programs generates with great frequency a significant number of I/O operations.
A method for inputting and outputting data/information, in a computer system having a plurality of processors and a physical memory for use by the plurality of processors, includes creating a pinned virtual memory range database in which is stored virtual memory address information corresponding to pinned physical memory for each applications program being run on the computer system. The method further includes determining, using the pinned virtual memory range database, that the virtual memory address for data/information to be transferred thereto/therefrom corresponds to pinned physical memory; and transferring data/information to/from pinned physical memory, when said determining determines that the virtual memory address corresponds to pinned physical memory.
In more specific embodiments, the method further includes pinning a portion, an arbitrary portion, of the physical memory corresponding to identified virtual memory addresses of each applications program being loaded onto the computer system; and wherein said step of creating includes creating a pinned virtual memory range database that includes these virtual memory addresses. Additionally included in the method are the related steps of un-pinning the pinned physical memory of each applications program when it is being unloaded from the computer system; and removing the virtual address information corresponding to the pinned physical memory being unpinned from the pinned virtual memory range database.
According to one aspect of the present invention said step of determining includes the steps of looking-up an address range of an I/O buffer in the pinned virtual memory range database, determining if the I/O buffer address range corresponds to a virtual memory address range that is pinned physical memory, recording the mapping of the physical memory address range that corresponds to the identified virtual memory address range, and marking the I/O buffer as pre-pinned. In a more specific embodiment, said steps of looking-up, determining if the I/O buffer address range corresponds to a virtual memory address range that is pinned physical memory, recording and marking are performed by each microprocessor responsive to the initiation of an I/O operation by said each microprocessor. Also included are steps of obtaining a local/non-global lock of the pinned virtual memory range database prior to said step of looking-up, and releasing the local lock on the pinned virtual memory range database following said step of marking the I/O buffer.
Although the pinned ranges of physical memory should be generally sufficient to accommodate anticipated needs of a given applications program, it is possible that at a given time there may be a desire to pin additional physical memory to additional virtual memory at the discretion of either the application or operating system. Thus, according to another aspect of the present invention, and when a determination is made that the I/O buffer address range does not corresponds to a virtual memory address range for pinned physical memory, said step of determining further includes: obtaining a system-wide memory lock; (e.g., system-wide global PFD lock) mapping the physical memory address range corresponding to the virtual memory address range that were not pinned; pinning the mapped physical memory; and releasing the system-wide memory lock. In general terms, the physical memory not already pinned when the I/O operation was initiated is what should be pinned in the foregoing process. The method further includes the step of unpinning the pinned physical memory for the virtual memory address range that had not been pinned when the I/O operation was initiated, this step of unpinning being performed upon the completion of the transfer of data/information.
In accordance with yet another aspect of the present invention, the method further includes a pinned virtual memory range database updating mechanism by which the virtual memory data is updated to include the added physical memory and virtual memory addresses that were need to perform a given I/O operation. In a more specific embodiment, this virtual memory updating mechanism includes determining if the pinned virtual memory range database should be updated to include the additional virtual memory addresses that were not pinned when the I/O operation was initiated. This determination process would follow completion of the data/information transfer. If it is determined that the pinned virtual memory range database should be updated, then the database is updated to include the additional virtual memory addresses. If it is determined that the pinned virtual memory range database should not be updated, then the pinned physical memory for the virtual memory address range that had not been pinned when the I/O operation was initiated is unpinned in the manner describe herein. Leaving the pages pinned allows future I/O operations to be conducted in accordance with the methods of the present invention.
Also featured are applications programs and/or operating systems as well as multiprocessor computer systems that embody the above-described methodology. Typically, the program code for such operating systems and applications programs is contained are in some form of a computer readable medium so the program code is capable of being loaded onto a computer system either automatically by the computer system (e.g., when the computer is started or booted up) or by action of a user.
Other aspects and embodiments of the invention are discussed below.