The present invention is generally directed to systems and methods for accessing data in a multinode, shared storage data processing network. In particular, the present invention is directed to the use of what is, in effect, a plurality of metadata controllers (also referred to herein as metadata controller nodes or metadata control nodes) which provide application systems with time limited control for accessing individual files and file structures. Even more particularly, the present invention is directed to systems and methods for use in conjunction with storage area networks so as to allow them to operate in a manner which alleviates certain bottlenecks which are especially associated with access to and transmission of large files such as those relating to real time video images and/or complex visualization data. In a second aspect of the present invention, since the present invention employs the concept of having multiple metadata controllers with the level of control implemented being present at the granularity of individual files with temporal limitations, methods and systems for recovery from various forms of node failure are also provided which are consistent with this state of affairs. In a third aspect of the present invention a file locking mechanism is provided which permits the running of application programs on nodes which also operate as metadata control nodes; in particular, these application programs are thus provided with the ability to access, in a consistent manner, the same file data as is accessed from application nodes. The locking mechanism herein provides a mechanism for the more efficient use of numerically intense applications running on parallel metadata control nodes while visualization operations providing “views into the existing data” are provided by less critical application programs running on the other nodes (that is, in other words, on application nodes which are also referred to herein as non-metadata controller nodes). In a fourth aspect of the present invention, a method of access is provided which involves the use of a storage gateway which exists as an independent mechanism for verifying the appropriateness of access from application nodes which have received metadata control information from metadata controller nodes as part of their time limited grant of more direct access. This latter aspect increases the level of security by directly providing, in an independent manner, information regarding access to the storage gateway (or, if you will, storage node) from one of the metadata controller nodes. At the gateway, this affords an opportunity for a check or comparison to be made to insure that the “lease” has not expired and that the enumeration of blocks and their locations are accurate and that no tampering has occurred.
Since the present invention is closely involved with the concepts surrounding files, file systems and metadata, it is useful to provide a brief description of at least some of the pertinent terms. A more complete list is found in U.S. Pat. No. 6,032,216 which is assigned to the same assignee as the present invention. This patent is hereby incorporated herein by reference. However, the following glossary of terms from this patent is provided below since these terms are the ones that are most relevant for an easier understanding of the present invention:
Data/File system Data: These are arbitrary strings of bits which have meaning only in the context of a specific application.
File: A named string of bits which can be accessed by a computer application. A file has certain standard attributes such as length, a modification time and a time of last access.
Metadata: These are the control structures created by the file system software to describe the structure of a file and the use of the disks which contain the file system. Specific types of metadata which apply to file systems of this type are more particularly characterized below and include directories, inodes, allocation maps and logs.
Directories: these are control structures which associate a name with a set of data represented by an inode.
Inode: a data structure which contains the attributes of the file plus a series of pointers to areas of disk (or other storage media) which contain the data which make up the file. An inode may be supplemented by indirect blocks which supplement the inode with additional pointers, say, if the file is large.
Allocation maps: these are control structures which indicate whether specific areas of the disk (or other control structures such as inodes) are in use or are available. This allows software to effectively assign available blocks and inodes to new files.
Logs: these are a set of records used to keep the other types of metadata in synchronization (that is, in consistent states) to guard against loss in failure situations. Logs contain single records which describe related updates to multiple structures.
File system: a software component which manages a defined set of disks (or other media) and provides access to data in ways to facilitate consistent addition, modification and deletion of data and data files. The term is also used to describe the set of data and metadata contained within a specific set of disks (or other media). While the present invention is typically used most frequently in conjunction with rotating magnetic disk storage systems, it is usable with any data storage medium which is capable of being accessed by name with data located in non adjacent blocks; accordingly, where the terms “disk” or “disk storage” or the like are employed herein, this more general characterization of the storage medium is intended.
Metadata controller: a node or processor in a networked computer system (such as the pSeries of scalable parallel systems offered by the assignee of the present invention) through which all access requests to a file are processed. The present invention is particularly directed to systems and methods of operation employing a plurality of metadata controllers together with a mechanism for their coordinated usage.
The data processing systems described in U.S. Pat. No. 6,161,104 and U.S. Pat. No. 5,950,203 illustrate a mechanism in which two or more computing systems, which share a network path to a storage device, effectively share fast access to files contained on the storage device(s). This is achieved by one of the systems serving as the metadata controller for the file system with the other systems acquiring metadata from the metadata controller to allow direct access to the blocks which makeup the files. Only a single metadata controller is present in the systems shown in these two patents. This single metadata controller (MDC) interprets and creates metadata which describes the locations of files on the shared disks. This method allows non-metadata nodes to bypass the metadata controller on their access to data. This procedure has the potential for increasing data access performance for applications such as video streaming or for certain scientific applications which access large files. It is, nonetheless, characterized by the limitation of having but one metadata controller. Thus, even though metadata is made available to other nodes or computer systems in the network, ultimately there is but a single source for this information; and most importantly, there is but a single source for this information at a point in time when more immediate sources for this information would have been able to alleviate a bottleneck.
In systems of the present invention, this bottleneck problem is alleviated through the use of a special locking mechanism and the granting of temporary permission for direct file access from a class of nodes whose function is principally directed to running application programs. Another class of nodes is capable of obtaining these locks from a node containing a file system manager. However, it is noted that, in general, locks may be obtained from any central lock issuing authority or mechanism, not just from a node containing a file system manager, even though this is the preferred approach in systems of the present invention. These locks do not have a temporal limitation. However, this class of nodes (referred to herein as being Class A nodes or, equivalently, as being in a first plurality of nodes) is capable of granting temporary access to one or more nodes in the set of nodes used for running application programs. For the duration of the permission grant (referred to herein as the “lease term” or “lease period”), consistent access to file level data is guaranteed to be available from an application node which is provided with metadata information from one of the nodes from the first set of nodes (the Class A or metadata controller nodes) class. The class of nodes which are capable of acting as metadata controllers is referred to herein as being members of a first plurality of nodes. The other class of nodes, namely the ones which are capable of directly accessing an individually specified file, typically constitute what is referred to herein as a second plurality of nodes, also referred to herein as Class B nodes or application nodes, since that is their typical role, namely the running of user application programs requiring file access.
Accordingly, at any given time it is now possible to have a plurality of files in an open state with each file being accessed directly from an application node and with a first plurality of nodes actively operating as metadata controller nodes for various ones of these open files. As a result of this new state of affairs, the situation of node failure is also considered herein since failure recovery modalities should now consider the fact that a metadata controller node has surrendered at least some of its authority over file access, albeit temporarily. For example, one of the problems considered and solved herein is the failure of a single node (a Class A node) which acts as a metadata controller node. Also addressed is the problem that occurs if and when there are multiple node failures, and the failed nodes are all metadata controller nodes (Class A nodes) but none of the failed nodes is the node acting as the file system manager. Yet another problem addressed herein relates to the use of multiple metadata controller nodes and the specific circumstance that at least two nodes have failed and the failed nodes include one of the (Class A) metadata controller nodes and the node acting as the file system manager. In all three of these cases, if the only failure is at a metadata controller node (Class A node), the scope of recovery is limited to the files known to be locked at that node. If failure occurs at the node acting as the file system manager, the scope of possible locking is considered to be the entire file system. While the node classes have been referred to above as having a plurality of members, as is typically and preferably the case, it is still within the scope of the present invention that there be a single node in each class. It is noted though, that while such a configuration (that is, single node in each class) is possible within the scope of activity contemplated for the procedures of the present invention, the advantages of being able to rely on a plurality of nodes for metadata controller operations is no longer possible in this very limited mode of operation.