The problem of network data file storage began when computers were first linked together. Traditionally, one solution to the problem of storing data has been to allocate services to a network computer or file server [See Distributed Operating Systems by Andrew S. Tanenbaum; 1994 Prentice Hall; ISBN:0132199084]. Software, installed at other network client computers, permitted access to various network servers by copying the files of the network servers locally or by emulating access to files on network servers from a virtual local disk. FIG. 1 illustrates one prior art method for shared access to a file at a file server 10 as developed for personal DOS-based IBM compatible computers. Client software for DOS-based IBM compatible computers, if properly connected to the local network 20 and the corresponding file server 10, permitted viewing of the network drive. Software running on client computers 30 made files located at a remote file server 10 appear to be local. Thus, the allocation of services to a network computer or file server requires a dedicated file server and the client-server access model in order to access network files. [See CHARLES CROWLEY, OPERATING SYSTEMS: A DESIGN-ORIENTED APPROACH (Irwin, 1997) ISBN 0256151512].
This allocation of services to a network computer or file server has several disadvantages. In the case of shared access, several clients may view the same data file locally at the client computer. Users of client computers may be unaware of the shared access to a data file and start writing pseudo-local files which are stored to the same location. The result is file distortion. Multiple failures are bound to occur. Because pseudo-local files are physically located at the same network server, the pseudo-local files are entirely dependent on that network server. This means that any hardware, software or network failure at that network server makes file access impossible. Even properly functioning network servers may cause such a problem while rebooting their operating system. Any scheduled reboot of an operating system inevitably blocks data file access and service.
Clustering is one solution to the problem of file distortion or inability to gain access to data files. Digital Equipment Company (DEC) developed and implemented a well-known hardware and software concept in the field of clustering. Specifically, clustering is the creation of a special disk array linked up to several computer processor units. [See Roy G. Davis, VAXcluster Principles (Digital Press) ISBN 1555581129]. When a special disk array is linked up to several computer processor units, special task hardware, not a normal computer, provides shared access and guarantees absolute interchangeability of all participating computers. Being less complex, clustering hardware provides higher reliability in comparison to a standalone computer. However, a clustering configuration requires the installation of corresponding software on all of the operating systems of the linked client computers. This method provides flexible independent client computer services, but failure of the clustering hardware again causes loss of service.
Several similar network servers, interacting with client computers, may provide identical service and data access to every client computer. Data replication at every network server together with identical service, independent of the location of the client computer and service center, may be regarded as the easiest solution to this problem. However, some inconveniences, such as complex data synchronizing processes, remain.
Another solution to the problem of file distortion or the inability to gain access to files is the creation of customized distributed data storage. Service distribution implies that all service processes of the operating system are performed at the network nodes (servers) instead of at a local computer. Such service distribution reduces response time and improves provider-to-client channel capacity. Simultaneously, this distribution solves the problem of limited single network server processor power, because, for example, a service request can be processed by a larger number of computers. All of the incoming requests are done at a larger number of network servers. Thus, network server overloading is decreased even in cases of non-parallel requests, processing on a cluster node due to request distribution. Customized distributed data storage enhances service fault-tolerance level. Specifically, when a network server fails or the network is inaccessible, a client computer may switch over to a similar network server and receive the same service. The symmetry of the network servers in the computer network determines service availability.
Such customized distributed data storage service requires distributed data storage to enforce symmetry of services provided for client computers. There is a need for the development of special-purpose distribution and storage algorithms to yield optimum distributed data storage with respect to both data content and resource requirements. Such algorithms would maintain consistent network server content at the different network servers in a computer network to provide service symmetry to client computers.
Currently available methods and algorithms for distributed data storage are complex. The data duplication or mirroring approach is frequently used, in which the server at every network node possesses a complete copy of all stored data files. Mirroring, systems of FTP servers have been arranged in such a manner, as discussed in the following references (See U.S. Pat. No. 5,835,911, Nakagawa; U.S. Pat. No. 5,434,994, Shaheen; U.S. Pat. No. 5,155,847, Kirouac; U.S. Pat. No. 5,742,792, Yanai).
Regular network data systems, such as NFS (Network File System) [See BRIAN PAWLOWSKI, NFS VERSION 3 DESIGN AND IMPLEMENTATION (USENIX Summer 1994)] at UNIX (developed by Sun Microsystems), usually include a pre-defined network server and client computers for accessing a particular network server to obtain a necessary data file. Such network data file systems are generally used with a minimum number of network servers (See U.S. Pat. No. 5,513,314, Kandasamy, et al.).
Network distributed file systems are arranged in a more complicated manner. Such network distributed file systems generally permit users to work with the distributed file system as a whole (not with just a selected sever as in the NFS case) in a shared uniform namespace, regardless of whether a specific file server is accessible. Namespace is a collection of unique names, where a name is an arbitrary identifier, usually an integer or a character string. Usually the term “name” is applied to such objects as files, directories, devices, computers, etc.
Another approach to creating a distributed data file storage access model is the hierarchical system of file naming combined with local data caching on the client computer server. Transarc Corporation (now IBM Transarc Labs), AFS [See RICHARD CAMPBELL et al. MANAGING AFS: THE ANDREW FILE SYSTEM (Prentice Hall 1997) ISBN 0138027293] and Coda [See P. J. Braam, The Coda Distributed File System (#74, Linux Journal #50 Jun. 1998); M. SATYANARAYANAN, CODA: A HIGHLY AVAILABLE FILE SYSTEM FOR A DISTRIBUTED WORKSTATION ENVIRONMENT (Proceedings of the Second IEEE Workshop on Workstation Operating Systems September 1989)] systems are examples of such distributed data file storage systems. For optimal data access, these distributed data file storage systems intensively cache data at the local file system of a client computer and fully utilize this cache to reduce the number and size of requests to the system file server.
AFS transmits all of the data file requests to the system file server (even files within the cache of a local data file system) but permits access to the data file requests only after it is determined that the data files were not altered after the copying process was finished. In case of file server disconnection, AFS usually does not allow data file access. Coda, in contrast, assumes that such data files tend to stay intact, and permits working on these data files without complete recovery of the file server connection. The fault tolerance level under this approach is higher than with the regular use of pre-defined network servers, which requires being permanently online. However, such an approach permits several client computers to concurrently access the same data file, with the potential for errors.
Both the AFS and the Coda approaches cache entire data files and possess multiple file copies with various modifications. The possession of multiple file copies with various modifications complicates the efficiency of file system support for data coherence. Moreover, access to data files outside the cache is possible only after those data files have been fully loaded to the cache. Thus, in the model when different data is stored at different servers, data accessibility levels can be susceptible to failure in case of a server disconnection.
Namespace of these AFS and Coda file systems is hierarchical; that is, it stems from a shared point, i.e., the root of a data file system. Nevertheless, every AFS/DFS/Coda name corresponds to a specific file server. Loss of a specific file server results in loss of access to certain data files. When this occurs, data files get split apart. A special function is used to search the namespace, recognize the server, and access the data files. Thus, potential file interchangeability exists, for example, by direct substitution of a data file which is not found by another file. But, even if properly organized, such a system does not offer any improvement in fault tolerance level.
Distributed access to data files may also be achieved by a distributed storage of network data blocks, rather than distributed storage of entire data files. In this approach, the file system is built over such a set of network data blocks. The server software emulates a powerful virtual low-level disk which is accessible by software running on the client's computer. A regular data file system is built up over the storage of network data blocks as if it was working with a local disk. If there is a need to synchronize records in the same network data blocks, e.g., when two independent client computers request write access to the directory, special locking algorithms would be required. Such a distributed data storage system would be rather expensive with respect to both scalability and efficiency.
Another method of data storage distribution, RAID Level 5 [See GREGORY F. PFISTER, IN SEARCH OF CLUSTERS (Prentice Hall 1998) ISBN 0138997098], allows data acquisition even if a server or disks containing data are not accessible. RAID Level 5 is extensively used to deliver higher fault-tolerance efficiency of data files stored on disk. Using a similar algorithm, the Serverless File System [See TOM ANDERSON et al., SERVERLESS NETWORK FILE SYSTEMS (15th Symposium on Operating Systems Principles, ACM Transactions on Computer Systems 1995)] was developed at UC-Berkeley. The Serverless File System uses a group of network servers rather than a single dedicated server. The Serverless File System is based on distributed storage of data blocks, wherein a RAID algorithm can successfully restore every data block (stopping at most one server at a time). According to the Serverless File System, the file system asymmetrically divides supporting data blocks between different network servers and possesses two different states: a normal state when all the network servers are accessible, and a failure state when a special recovery procedure is required for an unavailable network server. The system does not allow use of network servers with unequal efficiency and connection quality, since data accessibility depends on access to all of the network servers.
All file system developers inevitably come across the problem of dynamic file content changes. It is well known that almost all data storage files eventually require some content changes. Various methods of changing data file content have been proposed to solve this problem. The most common method of providing for content changes in data storage files includes changing the file content at the file location, i.e. in the file system. Most of the old MSDOS and UNIX operating systems are arranged in such a manner. Changing the data file content at the location of the file has certain disadvantages, since any errors made during file recording can influence the content of the data file. For instance, if the computer stops working while a data file is being recorded, the file will be irreparably damaged or irretrievably lost. Thus, it is preferable to have an operating system with unmodifiable files of a fixed size and location.
To solve the data file modification problem, some systems support different versions of the same file. VAX VMS file system [See KIRBY MCCOY, VMS FILE SYSTEM INTERNALS (Digital Press 1990) ISBN 1555580564)] records every data file modification as a whole data file under a new name, while keeping the previous version of that data file accessible. Then every data file modification, or version, is sent to the data file directory. The data file versions share the same data file name, but differ in data file numbering, temporarily ranked during the process of data file modification. FIG. 4 illustrates prior art data file storage 90 with the form versions 100 ranked by time. The new version 110 goes in full to data storage 120 after the file has been edited 130. Of course, this method of data storage yields numerous, virtually redundant, data file copies. Moreover, this data file modification method is very inefficient in that the operating system first reads the final file modification and then saves it to a new location, thus requiring disk space and disk I/O bandwidth nearly equal to the size of a doubled file.
Recording all changes to a data file in a special journal is another potential solution to the problem of data file system development. As later discussed, this technique was developed for databases to assure data safety and accessibility to data files in case of system failure. In this approach, changes to a data file are recorded in a special standard form usually called a log. From that log, records are gradually put into the current data file. FIG. 3 illustrates the process by which discrete changes 80a, 80b and 80c to the original data file are entered in the log, and then step-by-step copied to file 60. Such a transactional method reveals either all the changes to a data file or none of them, with no intermediate positions. The log contains a detailed indivisible stream of structured changes to every file. Data file systems based on this method are characterized by fast failure recovery. Changes to the data file system are highly coherent, and it is not necessary to check all available data to assure data file system consistency. This method, however, does not permit recording variances, as contrasted with an undo/redo log recording database technique.
What is needed is a fault tolerant data storage system which will optimize distributed data storage with respect to both data content and resource requirements. The same content should be available at different servers in order to provide client computer symmetry and promote data synchronization.