This invention is generally related to shared file systems and techniques for managing meta-data in such file systems.
Electronic storage typically features a file system program which implements a set of functions related to the storage and retrieval of data. The file system program associates the physical structures in which files and directories are stored, for instance one or more direct access devices (DADs) such as hard disk drives, with logical structures such as file names. In this way, data can be retrieved using the file name, without having to know the physical location of the data.
To allow multiple users access to the same user data, a shared file system program allows client software running on each one of a number of computers to write to or read from the same file stored in one or more DADs. The computers and DADs are connected by high-speed links. For instance, each of the computers and the DADs may be agents on a bus, such as a Fibre Channel (FC) bus. Each computer may communicate with the DADs using the Small Computer System Interface (SCSI) protocol. When a computer instructs the DAD to read or write data it is the xe2x80x9cinitiatorxe2x80x9d of the request, and the DAD is the xe2x80x9ctargetxe2x80x9d. The Fibre Channel bus allows multiple xe2x80x9cinitiatorsxe2x80x9d to coexist on the same bus. This allows the computers to access the same data stored on the DADs, i.e. allows the computers to implement a xe2x80x9cshared file systemxe2x80x9d.
A file system needs two classes of meta-data which provide, (1) a description of the available (free) space on the DADs and (2) a description of the users"" data (files). A file system uses meta-data to provide information about and/or documentation of user data managed by the file system. Meta-data may document data about (1) elements or attributes (name, size, data type, etc.), and (2) where the user data is located, how it is associated, its ownership, etc. Meta-data may include descriptive information about the context, quality and condition, or characteristics of the user data.
The meta-data for a file may include (1) the file name, including the file""s path in a hierarchical directory, (2) file attributes, ownership, security descriptors, and (3) descriptors of the DAAD addresses where the user data is stored. Whenever a file is written to, renamed, or deleted, its meta-data changes.
For acceptable performance in accessing files in a shared file system, each computer maintains a copy of the file system meta-data in its own internal solid state random access memory (RAM). Any changes made to the file system meta-data need to be carefully managed across all of the computers, so that each computer has a consistent view of the file system. In essence, (1) a computer needs to lock the file system meta-data (the whole meta-data or just specific portions) in order to gain unique control of it; (2) change it in order to perform the specific operation (create a file, delete a file, write data into a file, etc), (3) share the changes with all other computers so each computer maintains in its RAM the same image of the meta-data and (4) unlock the file system meta-data to allow other computers to perform their own operation.
One way for managing access to and thereby maintaining the consistency of meta-data in a shared file system is to provide locking and unlocking primitives using the bus that all computers share to access the DADs. In the Fibre Channel system (running the SCSI protocol) described above, a computer that wishes to change the meta-data attempts to lock a semaphore stored in the memory of one of the DADs. If successful, the computer is allowed to change the meta-data (or a specific portion thereof associated with the particular semaphore). The computer will share the changes with all other computers either by writing the changes to a DAD and instructing the other computers to read them, or by using a network protocol to communicate directly its changes. Finally, the semaphore is unlocked after the update to the meta-data has been reflected in each computer. This technique, however, can hamper the performance of the shared file system, and, in cases of high file system activity, such as during the creation and deletion of a large number of files, considerable delays may be encountered during the locking-unlocking procedure.
Another technique for maintaining the consistency of meta-data uses a reliable multicast or broadcast protocol, to update the copies of the meta-data in each computer. Once again, however, this technique may not scale well when more than a few computers are participating, as meta-data management tends to diminish the performance of the file system as a whole.
According to an embodiment of the invention, a shared file system is disclosed having one or more shared storage nodes and a number of processing nodes. The processor nodes are connected to each other in a logical ring and to the shared storage nodes. The storage nodes are used to store user data and file system meta-data (FSMD). Each respective processing node has a processor and a memory. The memory contains a number of instructions which, when executed by the processor, cause the respective processing node to (1) disallow modifications to the FSMD until it receives a token, (2) update the FSMD based upon the content of the token received from one of the processing nodes, (3) perform its own changes to the FSMD, (4) append information to the token that describes these changes, and then (5) send the token to the next processing node in the logical ring. Such a mechanism performs two functions in a shared file system: (1) the locking of the meta-data (or a portion of it) and (2) the sharing and updating in each computer an image or copy of the meta-data. Such a scheme has several advantages. For instance, the method may be implemented using relatively high-level software protocols, thus obviating the need for hardware support such as low level networking primitives that implement locking and unlocking of a bus to which the storage nodes are coupled. In addition, there is no need for a reliable multicast or reliable broadcast protocol, thereby providing the potential to scale well when more than a few computers are participating in the shared file system.
For the particular embodiment in which the changes to the FSMD are journaled in non-volatile memory at each processing node, the file system need not update the FSMD to the storage node very often, thereby further improving the performance of the file system while at the same time providing a reliable system in the event of a catastrophic power failure.
A token is a packet of data (of variable length) that is sent from one computer to another. The computers that cooperate in implementing a shared file system establish a logical ring, i.e. a logical sequence for sending this token from one computer to another. For example, if there are three computers A, B, and C, a token can be sent on this logical ring from computer A to computer B, from computer B to computer C and from computer C to computer A. The token performs two functions in the implementation of the shared file system: (1) it allows the locking of file system meta-data, i.e. a computer is allowed to modify the file system meta-data only if it owns the token (the token may be parked in the computer""s internal memory) and (2) by appending the descriptions of the modifications to the file system meta-data that each computer performs, the token allows sharing the updates between all computers.