1. Limited Copyright Waiver
A portion of the disclosure of this patent document contains computer code listings to which the claim of copyright protection is made. The copyright owner has no objection to the facsimile reproduction by any person of the patent document or the patent disclosure, as it appears in the U.S. Patent and Trademark Office patent file or records, but reserves all other rights whatsoever.
2. Field of the Invention
The present invention relates generally to file servers, and more particularly to management of a file system cache.
3. Background Art
Network applications have required increasingly large amounts of data storage. Network data storage has been provided by a file server having at least one processor coupled to one or more disk drives. The file server executes a file system program that maps file names and block offsets in the files to physical addresses of data blocks on the disk drives. Typically the file system program maintains a UNIX-based file system having a hierarchical file system structure including directories and files, and each directory and file has an xe2x80x9cinodexe2x80x9d containing metadata of the directory or file. Popular UNIX-based file systems are the UNIX file system (ufs), which is a version of Berkeley Fast File System (FFS) integrated with a vnode/vfs structure, and the System V file system (s5fs). The implementation of the ufs and s5fs file systems is described in Chapter 9, pp. 261-289, of Uresh Vahalia, Unix Internals: The New Frontiers, 1996, Prentice Hall, Inc., Simon and Schuster, Upper Valley River, N.J. 07458.
Of concern to users is not only the capacity and availability of network storage but also the integrity of data in the event of a system crash. Traditionally, users have relied on transaction processing techniques to maintain database consistency in the presence of a system crash. A common transaction processing technique is to subdivide an application program of a host processor into a series of transactions. Each transaction includes a set of read-write instructions that change the database from one consistent state to another. The set of read-write instructions for each transaction is terminated by an instruction that specifies a transaction commit operation. During the execution of the transaction, the database may become inconsistent. For example, in an accounting application, a transaction may have the effect of transferring funds from a first account to a second account. The application program has a first read-write instruction that debits the first account by a certain amount, and a second read-write instruction that credits the second account by the same amount. Before and after the transaction, the database has consistent states, in which the total of the funds in two accounts is constant. In other words, the total of the funds in the two accounts at the beginning of the transaction is the same as the total at the end of the transaction. During the transaction, the database will have an inconsistent state, in which the total of the funds in the two accounts will not be the same as at the beginning or at the end of the transaction.
The operating system responds to the transaction commit operations in such a way that it is possible to recover from a system failure by restoring the database to its consistent state existing just after commitment of the last completed transaction. A typical way of providing such recovery is to maintain a log file of the database changes and the commit commands. The log includes a sufficient amount of information (such as xe2x80x9cbeforexe2x80x9d and xe2x80x9cafterxe2x80x9d images) in order to undo the changes made to the database since the last commit command.
Network clients typically use a network file system access protocol to access one or more file systems maintained by the file server. One popular network file system access protocol is the Network File System (NFS). NFS is described in xe2x80x9cNFS: Network File Systems Protocol Specification,xe2x80x9d RFC 1094, Sun Microsystems, Inc., Mar. 1, 1989. NFS Version 2 has synchronous writes. When a client wants to write, it sends a string of write requests to the server. For each write request, the server writes data and attributes to disk before returning to the client an acknowledgement of completion of the write request. The attributes include the size of the file, the client owning the file, the time the file was last modified, and pointers to locations on the disk where the new data resides. This synchronous write operation is very slow, because the server has to wait for disk I/O before beginning a next write request.
NFS Version 3 has asynchronous writes. In the asynchronous write protocol, the client sends a string of write requests to the server. For each write request, the server does a xe2x80x9cfast writexe2x80x9d to random access memory, and returns to the client an acknowledgment of completion before writing attributes and data to the disk. At some point, the client may send a commit request to the server. In response to the commit request, the server checks whether all of the preceding data and attributes are written to disk, and once all of the preceding data and attributes are written to disk, the server returns to the client an acknowledgment of completion. This asynchronous write protocol is much faster than a synchronous write protocol.
The asynchronous write protocol introduces difficulties if users are permitted to access files that have been corrupted by a system crash. For example, NFS version 3 permits file attributes and file data to be written to the file server in any order. If the new attributes are written before the new data and the server crashes, then upon recovery, the new attributes are found and decoded to obtain pointers to data. The file may be corrupted if not all of the new data were written to disk. In addition, the pointers for the new data not yet written may point to blocks of data from an old version of a different file. Therefore, a data security problem may occur, since the client may not have access privileges to the old version of the different file.
A solution to this data consistency problem is described in Vahalia et al., U.S. Pat. No. 5,893,140 issued Apr. 6, 1999, incorporated herein by reference. The file server is provided with a file system cache. Data and attributes are stored in the file system cache and are not written down to storage until receipt of a commit request from the client. When the commit request is received, the data are sent before the attributes from the file system cache to the storage layer.
Although the introduction of a file system cache solves some problems associated with an asynchronous write protocol, it is insufficient to fully restore a file that has been corrupted by a system crash. Conventional transaction processing techniques at the application level and operating system level are sufficient to fully restore a file that has been corrupted, but these techniques are too burdensome to be used for all applications.
In accordance with one aspect of the invention, there is provided a method of operating a file server having a file system cache memory and storage containing a file system. The method includes the file server receiving at least one write request from at least one client, and in response, writing new file system attributes and new file system data to the file system cache memory. The new file system attributes include new links between file system objects and file system blocks. The method further includes the file server receiving a commit request from the client, the new file system attributes and the new file system data not being written into the file system in storage until receipt of the commit request, and in response to the commit request, writing the new file system attributes and the new file system data to the file system in storage. The file server further maintains in memory a directory and file mapping data structure for the file system. The directory and file mapping data structure permits file system block allocations and linkages between file system objects and the file system blocks to change during read/write access to the file system by the client prior to receiving the commit request. The file system block allocations include allocated blocks having block allocations that are the same as block allocations in the file system as stored in the storage, and preallocated blocks having block allocations that are different from block allocations in the file system as stored in the storage.
In accordance with another aspect, the invention provides a file server including a file system cache memory and storage. The file server further includes means responsive to a write request from a client for writing new file system attributes and new file system data to the file system cache memory, the new file system attributes including linkages between file system objects and file system blocks. The file server further includes means responsive to a commit request from the client for writing the new file system data and new file system attributes to a file system in the storage. Moreover, the file server includes means for maintaining in memory a directory and file mapping data structure for the file system. The directory and file mapping data structure permits file system block allocations and linkages between file system objects and file system blocks to change during read/write access to the file system by the client prior to receiving the commit request. The file system block allocations include allocated blocks having block allocations that are the same as block allocations in the file system as stored in the storage, and preallocated blocks having block allocations that are different from block allocations in the file system as stored in the storage.
In accordance with yet another aspect, the invention provides a file server including a file system layer for mapping file names to data storage locations in response to a write request from a client, a file system cache connected to the file system layer for storing new file system attributes and new file system data in response to the write request from the client; and nonvolatile storage connected to the file system layer for storing the new file system attributes and the new file system data in response to a commit request from the client. The file system layer is programmed for responding to the write request from the client by writing the new file system attributes and the new file system data to the file system cache and not writing the new file system attributes and the new file system data to the file system in storage until receipt of the commit request from the client. The file system layer is programmed for responding to the commit request from the client by writing the new file system data and the new file system attributes from the file system cache to the nonvolatile storage. The file system layer is further programmed to maintain in memory a directory and file mapping data structure for the file system. The directory and file mapping data structure permits file system block allocations and linkages between file system objects and the file system blocks to change during read/write access to the file system by the client prior to receiving the commit request. The file system block allocations include allocated blocks having block allocations that are the same as block allocations in the file system as stored in the storage, and preallocated blocks having block allocations that are different from block allocations in the file system as stored in the storage.