This invention relates generally to digital data processing, and, more particularly, relates to systems for efficient writing of data in local area networks (LANs) utilizing file servers.
The use of storage-intensive computer applications such as high-performance, high-resolution graphics has grown significantly in recent years, with indications that it will continue to grow through the next decade. Fueling user demand has been the introduction of lower cost 32-bit workstations and an increase in the base of applications software available for those systems. Because of their computational and graphics power, these workstations are employed in data-intensive applications such as electronic publishing, computer-aided design (CAD) and scientific research.
Paralleling these developments has been the emergence of industry standard communication protocols which permit users to operate in a multi-vendor environment. Each protocol defines the format of messages exchanged between devices in a network, such that the devices cooperatively execute selected operations and perform given tasks. In particular, file access protocols permit at least two machines to cooperate with a file server. The file server stores files and selectively enables remote client devices to read and write these files.
One such protocol is the Network File System (NFS) protocol, developed by Sun Microsystems, which allows users to share files across a network configuration such as Ethernet. It is most frequently used on UNIX systems, but implementations of NFS are utilized on a wide range of other systems. The NFS protocol can be described as a request-response protocol More particularly, it is structured as a set of interactions, each of which consists of a request sent by the client to the server, and a response sent by the server back to the client. Generally, the response indicates any errors resulting from processing the request, provides data sought by the request, or indicates that the request has been completed successfully. Requests are reissued by the client until a response is received. A response which indicates that a request has been performed is referred to as an acknowledgement.
Moreover, since the NFS protocol is a stateless protocol: the client is not required to retain information about requests to which the server has responded. Any failure of the server, including a crash, can be handled by the client by continuing to reissue unanswered requests until the server is again operational. Consequently stateless protocols require that the server reliably effect state changes called for by a given request, before responding to the request.
Under a stateless protocol, this function must be executed in a manner which preserves the state changes, even in the event of subsequent server failure. By acknowledging the request, the server implicitly "guarantees" that the write operation has been executed and that a subsequent server failure will not destroy the effects of the write. This assurance simplifies the client machine's task in handling a server failure. The client need merely reissue unacknowledged requests periodically until receiving an acknowledgement. The client device need not consider server failure and re-initialization. This feature of stateless protocols, such as NFS, greatly simplifies their implementation and use.
Certain networks that utilize UNIX and NFS employ the Fast File System, a relatively high-speed file system for UNIX that includes support for operations used in performing NFS requests. This support implements certain operations, such as file creation, in conformity with the statelessness requirements of NFS. For write operations, the Fast File System provides synchronous write operation which conforms to NFS statelessness requirements.
Unfortunately, the NFS requirement for providing assurances in write operations is extremely burdensome for the server, resulting in low write throughput. In particular, the assurances required by stateless protocols are typically enabled by the time-consuming process of writing the changed states to stable storage--i.e., memory considered reliable enough to serve as a repository for the persistent state of a given application. Generally, disk media are considered stable, as are battery-backed RAM devices. The latter are semiconductor memory devices having sufficient battery-based reserve power to preserve the validity of stored data, notwithstanding external power interruptions. Battery-backed RAM devices can provide stable storage operating at higher speeds than disk, but at the expense of increased system complexity and cost.
Conventional NFS server configurations, including those utilizing the Fast File System, have been unable to provide high rates of throughput in handling write operations. Write operation speeds typical of conventional practice are on the order of 50-100 kilobytes/second. This speed limitation markedly increases write response time throughout the network.
In conventional systems, higher speeds are attainable only by employing additional hardware, such as battery-backed RAM. This additional hardware significantly increases the cost and complexity of the system.
Accordingly, there exists a need for file server systems that can operate in accordance with stateless protocols, while providing higher write speeds and avoiding the requirement for additional hardware, such as battery-backed RAM.
Examination of an NFS write request will illustrate the temporal issues involved in a write operation. A block of data is transmitted from the client to the server, together with certain control information. This control information includes three parameters: (i) the "file handle," which identifies the file into which the write information is to be executed; (ii) the length of the data; and (iii) the target displacement of the data within the file.
Before acknowledging the request in a stateless protocol, the server must commit to stable storage all file changes that constitute the write operation, or sufficient data to reliably reconstruct the changes. The stable storage is typically a disk. As noted above, by acknowledging the request, the server implicitly guarantees that the write operation has been executed and that a subsequent server failure will not negate the effects of the write. Thus, all file states changed by a write request, including the data sent in the request itself, must be written to stable storage, and all disk data blocks modified by the request must be written synchronously to disk.
In recovering from a system crash, the server must be able to locate the modified data blocks on disk, using only the file handle and data structures on disk, but not data in memory. Therefore, if the data is being written to an area for which disk space has not been allocated, or for which the on-disk pointer to the disk data block has not yet been written, then all blocks containing pointers to the new data blocks must be synchronously written as well. Additionally, if the write operation extends the file, as many write operations do, the disk block containing file size information must also be written to disk synchronously.
In a UNIX system, the term "file size" refers to the ending displacement, in bytes, of the last data present in a file. In accordance with UNIX practice, no read operation is permitted to cross this boundary. Writing to an area beyond the current file size increments the file size to the ending displacement of the write operation. If a write leaves a gap between the old file size, the gap area must appear as zero bytes. In UNIX systems, this is implemented in part by leaving zero mapping pointers and having them represent blocks completely filled with zeroes. An area consisting of such blocks is referred to as a "hole."
These rules of file server behavior have significant effects on write operations in the Fast File System. Consider, for example, a new file being written sequentially. Assume that each write operation provides 8K bytes of data to write to one disk block of the new file, with each block newly allocated in the request which writes it. Because the file is being extended by each request, the file size must be updated for each operation. This requires a synchronous write of the file inode--i.e., the main disk structure which represents a file in UNIX. The inode contains values indicating the size of the file, access and modify times, locations of portions of the block in which the file's data is located and, if necessary, locations of indirect blocks from which the locations of the remaining data blocks may be determined.
The Fast File System attempts to enhance access speed by establishing cylinder groups, each group being a contiguous region of disk cylinders treated as a unit for purposes of file allocation. To increase locality of file access, the set of inodes on each disk is divided among the cylinder groups, with files assigned to individual cylinder groups based on the directory to which each corresponds. Large files are allocated to multiple cylinder groups, with allocation periodically switched to a new cylinder group as the size of the file passes certain defined limits.
In the Fast File System, for files smaller than 96K bytes, each NFS write operation which extends the file requires two disk writes, one for the data block and one for the inode containing the size and the data pointer. Generally, the data block and the inode will be on different disk cylinders (the collection of disk blocks accessible without moving the arm supporting the disk read-write heads). Each request will therefore require two movements of the disk arm from one cylinder to another--referred to as SEEKs--as well as two writes. SEEKs slow I/O operations dramatically, and become even more time-consuming as the length of the SEEK path increases.
For files larger than 96K bytes, the Fast File System is configured so that data pointers are contained in indirect blocks. These indirect blocks must be written on each request which writes a previously unwritten data block, in addition to the inode with the updated size, if the file is being extended. Therefore, each such NFS write request for files larger than 96K bytes will generally require three disk writes. Moreover, as the file becomes larger, the distances among the three blocks to be written will tend to be larger. This causes the required SEEKs to become slower.
Thus, each NFS write operation for large files will typically require three disk writes, each requiring a long SEEK. Moreover, each write must be synchronous, necessitating operating system scheduling delays before each successive I/O operations is executed. The result is very poor performance.
Accordingly, it is an object of the invention to provide improved file server systems having enhanced operational speed.
It is a further object of the invention to provide such file server systems characterized by high reliability and low cost. Other general and specific objects of the invention will in part be obvious and will in part appear hereinafter.