1. Field of the Invention
The present invention relates to data management and, more specifically, to the management of data on a server shared by multiple users. Still more specifically, the present invention relates to the management of data management servers, possibly in a distributed environment, which supports multiple sets of clients, where each set of clients communicates with the server via a specific data management protocol. In addition, the present invention provides a mechanism for processes executing locally on the data server machine to access the same data.
2. Background and Related Art
Computer systems can consist of one or more computers. Distributed computer systems are created by linking a number of computer systems by a private communication mechanism, local area network (LAN), or wide area network (WAN). Each of the linked computers typically has a processor, I/O devices, volatile storage, and non-volatile storage. Certain ones of the computers are designated as "servers". A server provides services for one or more other computers which are labelled "clients". A server usually provides non-volatile (e.g. hard disk) data storage that can be shared by a number of computers. Servers may also provide shared processing resources and shared access to expensive peripherals such as high speed printers or scanners.
The sharing of data resident on a server or of devices has certain advantages for departments or workgroups with common data processing requirements. Data can be organized as file trees called "volumes" or "filesets". Files in the fileset can be accessed by local users on the server machine or by remote users on client machines. Access to files and directories in a file set are synchronized by the file system to avoid conflicting updates to the data and to ensure consistency of data read from the file system.
When files are accessed only by local users on the server machine, the central synchronization mechanism of the local file system is sufficient for synchronizing user access to files. Usually, only one user is allowed to have write (update) access to a byte range of a file, while multiple users can access the same data for "read". A "write" access blocks all "read" access to the data. Synchronizing access to shared data by remote users is a more complex problem.
The mechanisms for sharing files across a network has evolved over time. The simplest form of sharing allows a client to request data from a file server. The data is sent to the client processor and any changes or modifications to the data are returned immediately to the server. There is no caching of the data at the client. In this case, synchronization of access to the data is similar to the case of local users access.
Distributed file systems enhance file sharing by adding mechanisms to more effectively distribute data to clients and to more effectively control sharing of files. Many distributed file systems exist. Popular distributed file systems are the Network File System (NFS) from SUN MICROSYSTEMS INC. (NFS and SUN MICROSYSTEMS are trademarks of Sun Microsystems Inc.), Andrew File System (AFS) from Carnegie Mellon University (CMU) and the Distributed Computing Environment (DCE) Distributed File System (DFS) from the Open Software Foundation (OSF).
NFS allows client machines to cache data from server files, for read access, for a limited time (usually 30 seconds). During that time, users on the client machine can read the data from a local cache without communicating with the file server. All updates are propagated immediately to the server. Since the server does not keep records of which clients have cached data, there is no guarantee that the users on the client machine will have the latest data. However, since the window for reading stale cache data is small, this is usually satisfactory to most users and application programs.
AFS allows clients to cache data for both read and write access. The server keeps track of data cached by a client, say client A, and when a different client B requests access to the same data, the server sends a "callback" request to client A requesting return of the data.
DCE DFS allows client machines to cache data for read and write access. It uses a server-based token mechanism to synchronize access to the server data and ensure client cache consistency. Clients acquire "read" tokens to ensure that the data in their caches are valid. If a client is to change data in a file, it acquires a "write" token for the data. Granting a "write" token to one client invalidates the "read" tokens for all other clients. The token invalidation renders the cached data in these clients invalid.
As mentioned above, a distributed file system consists of a server and multiple clients. Clients communicate with the server according to a pre-defined protocol. The server "exports" volumes (sometimes referred to as file sets) residing on the local server machine for the remote client use. The server for a distributed file system synchronizes access by all of its remote clients to files in the volume it has exported.
The following problem arises when a volume is exported by a distributed file system and is accessed by local users on the server machine as well as by remote clients of the distributed file system: How to synchronize access to shared flies between those two sets of users? Different mechanisms are currently used. When an NFS server exports a volume, it does not actually guarantee cache consistency for its remote clients. As mentioned above, all updates from clients are propagated to the server. Thus, the server's local file system mechanism is used for access synchronization between all users: local and remote. AFS and DFS use a different approach: Since the AFS and DFS server guarantee cache consistency for their clients, direct access to exported volumes by local users on the server machine is not allowed. A distributed file system client (for AFS or DFS) must be installed on the server machine and local users must access exported volumes via this "local" client. This approach lengthens the response time for the local users, and increases the load on the server for the distributed file system.
A more complicated problem arises when the same volume (fileset) needs to be exported to clients of more than one distributed file system e.g. to both AFS and DFS distributed file systems. Both of those distributed file systems allow caching of server data on the client machines and server synchronization of access to the server files by its clients, but the different distributed file systems have different synchronization mechanisms. The problem is: how to synchronize access to shared files by all those diverse clients as well as the local users?
Existing distributed file systems do not currently provide a solution to problem. The only solution available is to export each volume to clients of no more than one distributed file system with cache consistency guarantees.
A still a more complicated problem arise when the protocol for one or more distributed file systems allows potentially disconnected clients. Such file systems are usually called "Mobile Distributed File Systems". (See "System and Method for Efficient Caching in a Distributed File System", Filed Mar. 7, 1994 and having U.S. Ser. No. 081,206,706 and also "Disconnected Operation in the Coda File System" by M. Satyanaranayanan, Carnegie Mellon University, Proceedings of the 13th ACM Symposium on Operating Systems Principles, October 1991). This scenario may arise in two possible cases:
1. When a client is connected to the server via a "wireless" network. Such networks are usually less reliable than wired networks with frequent intermittent disconnections; and PA1 2. When a client caches data and then intentionally disconnects from the server. This is the case for users of portable computers who use their machines in the office (connected to the server) and in a remote location (at home, out of town, or on the road). While disconnected, the client reads, and potentially updates the cached data. Upon reconnecting with the server, the client propagates all the updates to the server.
The present invention also addresses the problems of synchronization between potentially disconnected clients and other users accessing the same fileset.
Thus, three technical problems exist: synchronizing file access by local and distributed clients without degrading local client access response time; synchronizing file access by multiple distribute file system protocols; and synchronizing file access by potentially disconnected clients.