Many large scale data processing systems now employ a multiplicity, often referred to as a cluster, of independent computer systems, all of which operate concurrently on discrete problems or portions of problems. An independent computer system is called a node of a multi-processing system cluster. In such systems, some of the nodes may be used for storage and maintenance of data files. These file serving nodes may be a single file server or a collection of file servers. In such a large scale data processing system, it is desirable to have data files distributed across the system so as to balance nodal work loads and storage loads. It is also desirable to protect against significant losses of critical data should one or more nodes malfunction. It is also desirable to enable several servers to share a large pool of storage (e.g disks) without having to partition and preassign ownership of the storage to particular ones of those servers that are sharing the storage space.
Generally a node refers to a workstation connected to a local area network (LAN). Specifically a node is a computer, repeater, a file server or similar peripheral device used to create, receive or repeat a message. For example a personal computer may be used as a node member in a data processing network. Further, in a network, data communication links are used to tie together various computer systems to allow the sharing of information and resources. For example, a LAN which ties together all PC's in a word processing department can enable users to access a common template of files or print on a single high speed laser printer. A PC may also serve as a node in a world wide area network (WAN) where mainframes and PCs are remotely connected. As well as functioning as a node, a PC may serve as a network host. Generally, in the context of the present invention, the word "node" is used to refer to a point where one or more functional units interconnect channels or data circuits in a data network. The word may also refer to the point at an end of a branch in a network.
A striped network file system with multiple servers offers the potential to achieve very high performance using multiple collections of inexpensive computers and disks. Also, distributing file data across a plurality of servers and storage devices provides the potential for improved data recovery in the event of a failure of any server or storage device if redundancy is added to critical data.
A striped network file system implemented over multiple servers in a distributed computing environment highlights and poses design issues such as how and where to store a file's resource information and how to allocate space to files. File resource information includes information as to the allocation of physical disk space, allocation of logical file blocks relative to physical disk space, data integrity mechanisms such as parity checks, and data security measures such as access control mechanisms.
The Zebra Striped Network File System (Hartman et al 1995) describes a striped network file system that batches small files together into a sequential log, divides the log into stripes and writes the larger, more efficient stripes to the servers. Each client creates its own log, so that each stripe in the file system contains data written by a single client.
However, the Zebra file system has several drawbacks. First, the Zebra file system implements a single file manager that provides a centralized resource for data block pointers and handles cache consistency operations. Use of centralized file manager is a potential performance bottleneck. In addition, the Zebra file system stripes each segment to all of the systems' storage servers which limits the maximum number of storage servers that Zebra can use efficiently thus limiting its scalability.
In addition, Zebra is designed to support UNIX workloads as found in offices and engineering environments. Such workloads are characterized by short file lifetimes, sequential file accesses, infrequent write-sharing of files by different clients and many small files. Zebra is not optimized to run database applications which tend to update and read large files randomly.
The Serverless Network File Systems (Anderson et al. 1996) resolves the centralized files resource manager problem by creating and distributing many copies of the file resource information for all files to each of the servers in the striped file system. More specifically, in a Serverless system, the file resource information for all files is stored in four key maps--manager map, imap, file directories and stripe group map (using file index numbers). These maps are globally replicated into the memory of each server. Thus, file resource information is available to all the servers in the striped file system.
In such a system, the difficulties in maintaining consistency across these map copies is monumental. Any changes in the file resource information must be incorporated into each map at each location in a manner that makes all the changes appear to be simultaneous, in order to maintain file system consistency. Writing and updating file resource information for each map at each location any time a change is made requires and incurs substantial file system overhead.