1. Field of the Invention
The present invention relates to a file input/output (I/O) control method, and more particularly to a high speed file I/O control method of controlling to access one file from a plurality of related processors in parallel.
2. Description of the Related Art
A system, in which a file is divided into subfiles which are divisionally stored in a plurality of file devices and accessed in parallel, is known as taught in N. Nieuwejaar and David Kotz, xe2x80x9cThe Galley Parallel File Systemxe2x80x9d, the Conference Proceedings of the 1996 International Conference on Supercomputing, pp. 374 to 381 and in JP-A-8-292905.
It is an object of the present invention to provide a file I/O control method capable of setting a file structure of each file matching an access pattern desired by a user to thereby enhance a file parallel access effect.
It is another object of the present invention to make it possible to set various attributes to each region of a file.
It is a further object of the present invention to provide a file I/O control method capable of collectively scheduling file parallel accesses by collecting access requests from a plurality of processes for each physical device and issuing the collected access requests to each physical device.
It is still another object of the present invention to improve the performance of data transfer between a disk device and a network.
In accordance with the invention, there is provided a file input/output control system comprising:
a plurality of first computers each having a plurality of disks and connected to a network; and
at least one second computer connected to the network for accessing the plurality of disks connected to the plurality of first computers,
said second computer comprising:
a retriever for retrieving a plurality of first data access requests issued from a plurality of processes of an application and comparing the plurality of first data access requests with correspondence relation defining information to thereby confirm that the plurality of first data access requests are accesses to a plurality of disks, the correspondence relation defining information being entered by a user in advance and indicating a correspondence relation between the plurality of disks in said first computers and each of a plurality of regions in a file accessed by said second computer; and
a scheduler for creating a plurality of second data access requests to the plurality of disks from a plurality of first data access requests coming from a plurality of second computers confirmed to be accesses to a plurality of disks, in accordance with the correspondence relation defining information between the plurality of disks and each of the plurality of regions in a file stored in the disks of said first computers, and transmitting the plurality of second access requests to the network,
wherein the plurality of second computers access the disks in accordance with the plurality of second data access requests received via the network.
Each of the first computers may include a rearranger for rearranging a plurality of second data access requests for each of the plurality of disks in the order of block numbers in each of the plurality of disks.
Each of the first computers may include a merger for merging, upon detection of that the plurality of second data access requests to each of the plurality of disks contain a plurality of data access requests to a continuous disc field, the plurality of disk access requests to one disk access request.
The merger includes a disk driver for controlling the plurality of disks and a network driver for holding data obtained by said disk driver by accessing the plurality of disks and transmitting the data to said at least one second computer via the network.
The network driver may include a memory for storing the data obtained by said disk driver by accessing the plurality of disks for each of said at least one second computer and transferring the data separately stored in each of said at least one second computer.
The first computers are connected via a second networks to said at least one second computer, said network drivers of the plurality of first computers transfer the separately stored data to said at least one second computer via the network and said second networks.
Although two-dimensional array data distributively stored in a file, a plurality of whose regions are stored in the plurality of first computers is defined in a row direction, in response to an access command for referring to the two-dimensional array data in a column direction, said network driver reads data containing unnecessary data from the plurality of disks, transmits the data containing unnecessary data to each of the plurality of second computers, and each of the plurality of second computers filters the data containing unnecessary data to discard the unnecessary data and obtain necessary data for the second computers.
According to one aspect of the present invention, a file input/output control system is provided which comprises: a plurality of first computers each having a plurality of disks and connected to a network; and at least one second computer connected to the network for accessing the plurality of disks connected to the plurality of first computers, the second computer comprising: a retriever for retrieving a plurality of first data access requests issued from a plurality of processes of an application and comparing the plurality of first data access requests with correspondence relation defining information to thereby confirm that the plurality of first data access requests are accesses to a plurality of disks, the correspondence relation defining information being entered by a user in advance and indicating a correspondence relation between the plurality of disks and each of a plurality of files stored in a disk of the second computer; and a scheduler for creating a plurality of second data access requests to the plurality of disks from a plurality of first data access requests confirmed to be accesses to a plurality of disks, in accordance with the correspondence relation defining information between the plurality of disks and each of the plurality of files stored in the disk of the second computer, and transmitting the plurality of second access requests to the network, wherein the plurality of first computers access the plurality of disks in accordance with the plurality of second data access requests received via the network.
Each of the plurality of first computers may comprise a rearranger for rearranging a plurality of second data access requests for each of the plurality of disks in the order of block numbers in each of the plurality of disks.
Each of the plurality of first computers may comprise a merger for merging, upon detection of that the plurality of second data access requests to each of the plurality of disks contain a plurality of data access requests to a continuous disk field, the plurality of disk access requests to one disk access request.
The merger may comprise a disk driver for controlling the plurality of disks and a network driver for holding data obtained by the disk driver by accessing the plurality of disks and transmitting the data to the at least one second computer via the network.
The network driver may comprise a memory for storing the data obtained by the disk driver by accessing the plurality of disks for each of the at least one second computer and transferring the data separately stored in each of the at least one second computer.
The plurality of first computers may be connected via a second network to the at least one second computer, the network drivers of the plurality of first computers transfer the separately stored data to the at least one second computer via the network and the second network.
Although two-dimensional array data distributively stored in the plurality of first computers is defined in a row direction, in response to an access command for referring to the two-dimensional array data in a column direction, the network driver may read data also containing unnecessary data from the plurality of disks, transmits the data also containing unnecessary data to each of the plurality of first computers, and each of the plurality of computers filters the data also containing unnecessary data to discard the unnecessary data and obtain necessary data for the first computer.
According to the present invention, a table is created for setting a file structure definition designated in response to a file structure setting request issued from an application program which requests to distributively allocate a plurality of physical devices to a plurality of divided regions of a file. I/O requests are collected for each physical device, by referring to the file structure table set in response to the I/O requests for requesting parallel accesses to a plurality of regions of the file. A high speed file I/O control method can be provided which controls parallel data transfers between physical devices and a plurality of processes executing the application program. Each region of the file can be set with various attributes such as a data transfer path, a data striping and a data caching.
Further, according to the present invention, a high speed file I/O control method is provided in which of a plurality of processes executing the application program, one process collects I/O requests for requesting parallel accesses to a plurality of regions of the file and issuing the I/O requests to each physical device.
Furthermore, according to the present invention, a high speed file I/O control method is provided in which for the data transfer via a network between a process executing the application program and a physical device, after a device driver of the physical device is set up by a file management program, data is directly transferred between the network and physical device via the device driver of the physical device and a network driver.