The present invention relates to a file system for processing files distributed and managed in a plurality of disk devices, and more particularly to a file system in which when a plurality of I/O paths are provided to access a disk device, it is possible to control switching among the plurality of I/O paths so as to access the disk device through one of the plurality of I/O paths.
In the UNIX file system, which is one of the conventional file systems, a number (a file ID) is defined to uniquely identify each file, and a file server can specify a file on which to perform read/write operation by supplying its file ID. The file server registers and manages a matching relationship between each file ID and an I/O path used to access a disk device storing the file, in a file management table (referred to as an inode in UNIX) stored in a memory. An I/O path is composed of such information as a node number, an I/O interface number, a device number, etc. This management system is described in, for example, a book entitled xe2x80x9cThe Design of The Unix Operating Systemxe2x80x9d authored by Maurice J. Bach (p60-p72).
Upon receiving a read/write access request with a file ID specified, the file server refers to the above file management table, determines an I/O path used to access the disk device based on the file ID, and accesses the disk device using the I/O path. In addition to the I/O path information, the file management table stores file management information such as each file size, the date of last update of each file, etc., and the file management information is read out from a disk device when a file is opened and it is written back to the disk device regularly or when the file is closed. The file server converts a file name supplied by the user to its file ID.
A known method for handling a plurality of disk devices in a file system is to add a name tree managed by a disk device B to a directory, for example, X within a name tree managed by a disk device A so as to show a plurality of disk devices (that is, a plurality of name trees) as if they existed within a single name tree. In this method, the user can access a file in the disk device B by accessing the directory X. This method is called xe2x80x9cmount operationxe2x80x9d. At system start-up, the file server carries out one mount operation after another using a specific disk device (a route device) as a starting point so as to show a plurality of disk devices to the user as if they were a single name tree. A mount construction file in the route device stores information on a matching relationship between each disk device to be subjected to mount operation at system start-up and a directory name (a mount point) of a name tree onto which the disk device is mounted. The file server performs mount operation according to the information stored in the mount construction file at system start-up.
The mount construction file includes information on each I/O path used to specify a disk device to access it. The file server reads the matching relationships between I/O paths and mount points registered in the mount construction file into memory as mount construction information at the time of performing mount operation. When the user opens a file by specifying its file name, the file server obtains an I/O path used to access the physical disk device storing the file based on the above mount construction information, and creates a file management table. Accordingly, when the configuration of a system has been changed as a result of, for example, connecting a new disk device to the system, the system administrator must set new construction information in the computer system by rewriting the mount construction file.
On the other hand, to enhance reliability of computer systems, such a patent publication as Japanese Laid-Open Patent Publication No. 10-275090 (1998) describes a method which physically connects two different nodes to a physical disk device so as to be able to access the disk device through two different I/O paths. With this arrangement, one of the I/O paths is used in normal operation, and when a node fault has occurred and as a result, it is no longer possible to use the current I/O path, the other I/O path is used to access the disk device from another node in order to maintain availability of the disk device in case of a fault.
Another well-known method for enhancing reliability of disk devices is to multiplex and store a file in a plurality of disk devices (mirroring). A concept of xe2x80x9ca logical volumexe2x80x9d is generally used for mirroring. Mirroring is a mechanism which shows a plurality of physical disk devices as a single logical volume to the user. The user creates a logical volume in which information on a plurality of physical disk devices is registered beforehand. When the user has accessed the logical volume for a file operation in the same way as to access a physical disk device, file mirroring operation is performed on the plurality of disk devices. By using a logical volume, it is possible to carry out striping, which distributes and stores a file in a plurality of disk devices.
In order to dynamically switching from a current I/O path to another I/O path to access a physical disk device in the conventional UNIX file system when the current I/O path can no longer be used, it is necessary to search file management tables and mount construction information to rewrite each entry of the unavailable I/O path name with the entry of a new one. The above operation to rewrite an entry in each file management table with a new entry must be carried out for each open file. As a result, in a conventional UNIX file system to which the above technique for switching I/O paths is applied, it takes time to rewrite entries in file management tables, causing a problem that it is not possible to perform I/O operation on the target physical disk device during such rewriting time.
Furthermore, if two I/O paths are simply switched when a fault has occurred in one of the I/O paths, the node which was accessing a physical disk device before occurrence of the fault cannot properly write back the contents of caches which the node currently holds, such as a buffer cache (an area in which data is temporarily stored at the time of reading/writing the data from/to a physical disk device in order to reduce the number of input/output operations on the physical disk device, whose processing speed is slow compared with the memory), file management tables, and a disk cache (a cache memory held by a physical disk device for the same purpose as that of the buffer cache) in the physical disk device, to the physical disk device, raising a problem that important data may disappear. Furthermore, since this compromises integrity of the file system, it is necessary to restore the compromised file system to its proper state based on information on the file system redundantly stored in a physical disk device. This restoring operation requires checking of the entire disk device and therefore takes a long time, making it impossible to perform I/O operation on the physical disk device during the restoring operation.
In addition, since, after switching to the new I/O path, the new I/O path is used to access the disk device, it is necessary for the system administrator to update the mount construction file so that a matching relationship between the new I/O path and the mount point of the disk device is registered in the mount construction file, in order to properly perform the mount operation at the time of restarting the system after switching to the new I/O path. Further, in the case where mirroring of files is employed, the system administrator needs to create a logical volume and carry out a complicated procedure for managing the logical volume.
A first object of the present invention is to provide a file system capable of reducing time taken to switch I/O paths, and hiding as much of the I/O-path switching operation as possible from the general user. A second object of the present invention is to provide a file system capable of switching I/O paths without losing data stored in a buffer cache, file management tables, and a disk cache in a disk device, thereby eliminating the need for checking integrity of files. A third object of the present invention is to provide a file system capable of automatically updating a mount construction file at the time of switching I/O paths so as to reduce a burden on the system administrator. A fourth object of the present invention is to provide a file system which has a function of mirroring files without making the user aware of the logical volume.
To achieve the above objects, a file system according to the present invention includes at least one node having a file server for processing files distributed and managed in a plurality of physical disk devices, said files each having a defined file ID. Each node comprises: a file management table including records each composed of a file ID and a logical disk ID of a logical disk storing a file corresponding to the file ID; and a logical disk management table including records each composed of the logical disk ID and one or more I/O paths for accessing one or more physical disk devices corresponding to the logical disk; wherein upon receiving a request for accessing a file specifying a file ID from a user, the file server refers to the file management table, and determines a logical disk ID of a logical disk storing the file based on the file ID. The file server then refers to the logical disk management table to determine an I/O path for accessing a physical disk device corresponding to the logical disk based on the logical disk ID, and accesses the physical disk device by use of the determined I/O path. It should be noted that an I/O path is composed of such information as a node number, an I/O interface number, a disk controller number.
A logical disk management table according to the present invention includes status flags each indicating the operational state (one of the three states xe2x80x9coperationalxe2x80x9d, xe2x80x9cstandbyxe2x80x9d or xe2x80x9cwaitingxe2x80x9d, and xe2x80x9cunavailablexe2x80x9d) of each I/O path registered in the logical disk management table, and the file server accesses a physical disk device using an I/O path (an operational I/O path) whose status flag is set to xe2x80x9coperationalxe2x80x9d in normal operation. When a fault has occurred in an operational I/O path, the file server in a node which has detected the fault updates the logical disk management table in the node by setting the status flag of the faulty I/O path to xe2x80x9cunavailablexe2x80x9d and the status flag of an I/O path which currently indicates xe2x80x9cstandbyxe2x80x9d to xe2x80x9coperationalxe2x80x9d in order to designate a new operational path. The file server then communicates with the file servers in all remote nodes to copy contents of the updated logical disk management table to the logical disk management tables in all remote nodes. After that, the file server switches from the current (faulty) operational I/O path to the new operational I/O path for accessing the physical disk device.
During the process of switching the I/O paths, the file server included in the faulty I/O path holds requests for accessing the current (faulty) operational I/O path, and transmits the held requests for accessing the current (faulty) operational I/O path to the server included in the new operational I/O path after the I/O-path switching has been completed. This makes it possible to dynamically performing the process of switching I/O paths and thereby eliminate the need for searching and updating file management tables at the time of switching the I/O paths, reducing time taken to switch the I/O paths.
Further, according to the present invention, to maintain integrity of a file system, data stored in a cache of a disk controller provided in a physical disk device which was being accessed using an operational I/O path no longer available at the time of switching I/O paths is written back to the physical disk device if the data is necessary to write back to the physical disk device. In the present invention, this is done by another controller provided in the physical disk device. Furthermore, the file server included in the currently unavailable operational I/O path communicates with the file server included in the new operational I/O path. At that time, contents of the buffer cache and the file management table which reside in the main memory of the node included in the currently unavailable operational I/O path are transferred to the node included in the new operational I/O path if the contents of the buffer cache and the file management table are necessary to write back to the physical disk device. Thus, the present invention is capable of preventing loss of data existing in the disk cache of the disk device, the buffer cache, and the file management table, eliminating the need for checking integrity of the file system.
Furthermore, a mount construction file according to the present invention includes availability information which is set for each I/O path and indicates whether the I/O path is available. A file server reads the mount construction file at system start-up, and sets xe2x80x9coperationalxe2x80x9d or xe2x80x9cstandbyxe2x80x9d for each of status flags in the logical disk management table corresponding to I/O paths whose availability information is set to xe2x80x9cavailablexe2x80x9d, whereas the file server sets xe2x80x9cunavailablexe2x80x9d for each of status flags in the logical disk management table corresponding to I/O paths whose availability information is set to xe2x80x9cunavailablexe2x80x9d. The file server then carries out access settings so as to access physical disk devices using only I/O paths whose availability information is set to xe2x80x9cavailablexe2x80x9d in the mount construction file. After switching of I/O paths has been completed (or an I/O path has been disconnected), the file server updates the mount construction file by rewriting availability information on the currently unavailable operational I/O path by information indicating xe2x80x9cunavailablexe2x80x9d. When a currently unavailable I/O path has become available again, the file server updates the mount construction file by rewriting availability information on the I/O path by information indicating xe2x80x9cavailablexe2x80x9d. Thus, the present invention automates rewriting of the mount construction file performed when I/O paths have been switched or an I/O path has been restored, making it possible to reduce a burden on the system administrator.
Furthermore, the present invention is capable of mirroring files by using a plurality of disk devices accessed through a plurality of I/O paths registered in one entry in the mount construction file, making it possible to carry out mirroring of files without use of a logical volume by the user.