First, a structure of a computer system will be explained. FIG. 8 is a block diagram showing an example of the structure of the computer system. The computer system shown in FIG. 8 includes a computer including a CPU (Central Processing Unit) 1 and a memory 2, an input/output device (a disk controller) 4, an input/output device (an NIC (Network Interface Card)) 5, a disk 3, and a communication path 6.
The CPU 1 performs control for the computer by executing an OS (Operating System). The input/output devices 4 and 5 are devices for inputting data to and outputting data from media and perform input/output processing by executing an input/output control program. Here, the media means the disk 3 and the communication path 6. The input/output device 4 is a device for performing data input/output between the memory 2 and the disk 3. The input/output device 5 is a device for performing data input/output between the memory 2 and the communication path 6. It is assumed that the computer system performs virtual storage. Here, storages for performing the virtual storage include a main storage and an external storage. The main storage is the memory 2 and the external storage is the disk 3.
In the computer system having the virtual storage, a process designates a memory area that is accessed using virtual addresses. Therefore, the OS manages storage areas of the storages.
Here, actual addresses and virtual addresses will be explained. The actual addresses are addresses allocated to the storages. The virtual addresses are addresses that are used by a process and an input/output control program instead of the actual addresses. The OS converts the virtual addresses to the actual addresses using an address conversion table as required, whereby reading and writing in the actual addresses of the storages are performed. The address conversion table is a table associating the virtual addresses and the actual addresses. In the address conversion table, when the actual addresses corresponding to the virtual addresses indicate physical pages, this represents that pages corresponding to the virtual addresses are paged in. The physical pages are pages actually presenting on a memory among pages. When the actual addresses corresponding to the virtual addresses are identifiers of swap areas of a disk, this represents that pages corresponding to the virtual addresses are paged out. In addition, when the actual addresses corresponding to the virtual addresses are NULL, this indicates that there are no entities of pages corresponding to the virtual addresses.
On the other hand, the input/output device designates the memory areas using the actual addresses in accessing the memory areas. Therefore, when the input/output device inputs data to and outputs data from a memory area designated by a program, it is necessary to use one of two input/output methods described below.
First, a first input/output method will be explained. The OS prepares an input/output dedicated area serving as a memory area dedicated to input/output, which is not paged out at the time of initialization of the OS, and notifies the input/output control program of an actual address of the input/output dedicated area. The input/output control program performs input/output using only the notified input/output dedicated area.
When the input/output device performs output, the CPU copies output data from the memory area designated by the program to the input/output dedicated area. The input/output control program reads the output data from the input/output dedicated area. On the other hand, when the input/output device performs input, the input/output control program writes input data in the input/output dedicated area. The CPU copies the input data from the input/output dedicated area to the memory area designated by the program.
Next, a second input/output method will be explained. The OS pins down (page-locks) a page including a memory area used for input/output of data every time input/output processing is performed such that the page is not paged out and notifies the input/output control program of an actual address of this memory area. The input/output control program applies input/output to the memory area directly.
Reasons for pinning down the page including the memory area used for data input/output in the second input/output method are as described below. First, in the computer system for performing virtual storage, the physical pages are not always present for storage areas on the virtual address space. In addition, the correspondence between the virtual addresses and the actual addresses may change. However, when all the memory areas are pinned down, it is impossible to expand memory areas with page out processing when the physical pages are insufficient. Therefore, a quantity of physical pages, which can be pinned down, is limited to some extent. Therefore, basically, it is necessary to pin down memory areas every time input/output processing is performed.
The first input/output method is implemented in many computer systems. This is because, since time for copying input/output data in a memory is short compared with time required for input/output of data, copy processing does not cause a large overhead in terms of performance. In addition, this is because, if memory areas are pinned down every time input/output processing is performed as in the second input/output method, the input/output processing becomes complicated.
On the other hand, in a high-speed input/output device, in particular, a high-speed communication path with speed exceeding 1 Gbps, since an overhead for copying input/output data is large compared with time required for input/output, there is an example in which data transfer is implemented by the second input/output method.
As an implementation example, there is a PM library (reference 1) serving as a communication library of the Myrinet that is one of high-speed communication paths. In the PM library, in addition to the second input/output method, a pinned-down memory area is cached without being released immediately after input/output to reduce the number of times of pin-down processing and a limitation for pinning down a memory area, in which input processing such as data reception is possible, in advance is imposed to control occurrence of interruption (reference 2).
Reference 1: Hiroshi Tezuka, Atsushi Hori, and Yuu Ishikawa: “Design and Implementation of Work Station Cluster Communication Library PM”, Parallel Processing Symposium JSPP '96, pp. 41 to 48 (1996).
Reference 2: H. Tezuka, F. 'Carroll, A. Hori, and Y. Ishikawa, “Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication”, First Merged Symposium IPPS/SPDP 1998 12th International Parallel Processing Symposium & 9th Symposium on Parallel and Distributed Processing, 1998.
In the conventional techniques, the OS has management information of the address conversion table. In other words, the OS takes the initiative in performing page operation such as page-in and page-out of respective pages and page management such as pin-down and release of pin-down. Therefore, it is generally difficult to detect, from the input/output device side, to which page having an actual address an input/output object memory area designated by a virtual address correspond and whether an actual address is currently allocated to a physical page. This makes it necessary to use the first input/output method or the second input/output method.
However, the first input/output method has problems described below. First, in particular, when input/output is performed using a high-speed input/output device, it is a significant problem in that the copy processing performed in the first input/output processing has a large overhead for copy of input/output data compared with time required for input/output. In addition, when a size of data attempted to be inputted/outputted at a time is larger than a size of the input/output dedicated area, it is necessary to transfer the data in plural times.
The second input/output method has problems described below. First, basically, the pin-down processing is performed every time the input/output processing is performed. In addition, when a request for input/output is issued from the input/output device side, in general, it is necessary to cause interruption in order to request the OS to perform the pin-down processing.
In many computer systems, in performing page-out processing, computer access information recording a state of access of the computer to pages is acquired, significance is given to the respective pages in accordance with the computer access information, and the pages are selected as objects of page-out in order from one with low significance. However, since a state of access of the input/output device to the pages is not taken into account, it is impossible to select objects of page-out in accordance with the state of access of the input/output device.
The invention has been devised to solve such problems and it is an object of the invention to provide an input/output device, a computer, a computer system, an input/output control program, an OS, a page management program, and a page management method that can improve performance of the computer system after execution of page-out processing by reducing an overhead of the pin-down processing and selecting page-out object pages by reflecting a state of access of an input/output device to pages on the selection.