As shown in FIG. 1, a distributed networking environment 1 consists of two or more nodes A, B, C, connected through a communication link or a network 3. The network 3 can be either a local area network (LAN), or a wide area network (WAN).
At any of the nodes A, B, C, there may be a processing system 10A, 10B, 10C, such as a workstation. Each of these processing systems 10A, 10B, 10C, may be a single user system or a multi-user system with the ability to use the network 3 to access files located at a remote node. For example, the processing system 10A at local node A is able to access the files 5B and 5C at the remote nodes B and C, respectively.
Within this document, the term "server" will be used to indicate the processing system where the file is permanently stored, and the term "client" will be used to mean any other processing system having processes accessing the file. It is to be understood, however, that the term "server" does not mean a dedicated server as that term is used in some local area network systems. The distributed services system in which the invention is implemented is truly a distributed system supporting a wide variety of applications running at different nodes in the system which may access files located anywhere in the system.
As mentioned, the invention to be described hereinafter is directed to a distributed data processing system in a communication network. In this environment, each processor at a node in the network potentially may access all the files in the network no matter at which nodes the files may reside.
Other approaches to supporting a distributed data processing system are known. For example, IBM's Distributed Services for the AIX operating system is disclosed in U.S. Pat. No. 4,887,204 "A System and Method for Accessing Remote Files in a Distributed Networking Environment ", filed Feb. 13, 1987 in the name of Johnson et al. In addition, Sun Microsystems has released a Network File System (NFS) and Bell Laboratories has developed a Remote File System (RFS). The Sun Microsystems NFS has been described in a series of publications including S.R. Kleiman, "Vnodes: An Architecture for Multiple File System Types in Sun UNIX", Conference Proceedings, USENIX 1986 Summer Technical Conference and Exhibition, pp. 238 to 247; Russel Sandberg et al., "Design and Implementation of the Sun Network Filesystem", Conference Proceedings, Usenix 1985, pp. 119 to 130; Dan Walsh et al., "Overview of the Sun Network File System", pp. 117 to 124; JoMei Chang, "Status Monitor Provides Network Locking Service for NFS", JoMei Chang, "SunNet", pp. 71 to 75; and Bradley Taylor, "Secure Networking in the Sun Environment", pp. 28 to 36. The AT&T RFS has also been described in a series of publications including Andrew P. Rifkin et al., "RFS Architectural Overview", USENIX Conference Proceedings, Atlanta, Ga. (June 1986), pp. 1 to 12; Richard Hamilton et al., "An Administrator's View of Remote File Sharing", pp. 1 to 9; Tom Houghton et al., "File Systems Switch", pp. 1 to 2; and David J. Olander et al., "A Framework for Networking in System V", pp. 1 to 8.
One feature of the distributed services system in which the subject invention is implemented which distinguishes it from the Sun Microsystems NFS, for example, is that Sun's approach was to design what is essentially a stateless server. This means that the server does not store any information about client nodes, including such information as which client nodes have a server file open or whether client processes have a file open in read.sub.-- only or read.sub.-- write modes. Such an implementation simplifies the design of the server because the server does not have to deal with error recovery situations which may arise when a client fails or goes off-line without properly informing the server that it is releasing its claim on server resources.
An entirely different approach was taken in the design of the distributed services system in which the present invention is implemented. More specifically, the distributed services system may be characterized as a "stateful implementation". A "stateful" server, such as that described here, does keep information about who is using its files and how the files are being used. This requires that the server have some way to detect the loss of contact with a client so that accumulated state information about that client can be discarded. The cache management strategies described here cannot be implemented unless the server keeps such state information.
The problems encountered in accessing data at remote nodes can be better understood by first examining how a stand-alone system accesses files. In a stand alone system, such as 10 as shown in FIG. 2, a local buffer 12 in the operating system 11 is used to buffer the data transferred between the permanent storage 2, such as a hard file or a disk in a workstation, and the user address space 14. The local buffer 12 in the operating system 11 is also referred to as a local cache or kernel buffer.
In the stand-alone system, the kernel buffer 12 is divided into blocks 15 which are identified by device number, and logical block number within the device. When a read system call 16 is issued, it is issued with a file descriptor of the file 5 for a byte range within the file 5, as shown in step 101, FIG. 3. The operating system 11 takes this information and converts it to device number, and logical block numbers in the device, step 102, FIG. 3. If the block is in the cache, step 103, the data is obtained directly from the cache, step 105. At step 103, in the case in which the cache does not already contain the block that is sought, the data is read into the cache in step 104 before proceeding with step 105 where the data is obtained from the cache.
Any data read from the disk 2 is kept in the cache block 15 until the cache block 15 is needed for some other purpose. Consequently, any successive read requests from an application 4 that is running on the processing system 10 for the same data previously read is accessed from the cache 12 and not the disk 2. Reading from the cache is far less time consuming than reading from the disk.
Similarly, data written from the application 4 is not saved immediately on the disk 2, but is written to the cache 12. This saves disk accesses if another write operation is issued to the same block. Modified data blocks in the cache 12 are saved on the disk 2 periodically.
Use of a cache in a stand-alone system that utilizes an AIX operating system improves the overall performance of the system since disk accessing is eliminated for successive reads and writes. Overall performance is enhanced because accessing permanent storage is slower and more expensive than accessing a cache.
As described above, local buffers in the operating system can be used to improve the performance of stand-alone access to files. These local buffers are kept in fast memory while files are usually kept in slower permanent storage such as disk drives. Larger buffer caches can enhance a data processing system's performance because the cache can hold more of the data belonging to the system's files and hence will reduce the need to use the slower disk drives. A system's fast, physical memory is of limited size. Rather than partitioning physical memory by setting aside a fixed fraction for the operating system's kernel buffers, virtual memory techniques can be used to speed up the access to system's disk files. In this virtual memory technique, there is no fixed cache of disk blocks. Instead, data is cached in virtual not physical memory.
Virtual memory provides memory space larger than the available physical memory. This virtual memory space is divided into pages and used by programs as if the virtual memory space was true physical memory. A system's virtual memory pages reside in either actual physical memory frames, disk blocks, or both. Whenever a virtual memory page is not present in a physical frame, any attempt to use that page will result in an exception known as a page fault. The program attempting to use such a page generates a page fault and is temporarily suspended while the virtual memory page is retrieved from the disk block where it currently resides and is copied into a physical memory frame. After the virtual memory page has been assigned a physical frame, the original faulting program can be allowed to continue and it will now find that the data in that virtual memory page is available.
In the AIX operating system, programs can access the contents of files through system calls such as read or write or directly through mapped access. With mapped access a file is mapped into a portion of the program's virtual address space, causing each load or store to that portion of the program's address space to be reflected as an access to the file. Mapped access to a file has the advantage of allowing direct manipulation of the file or the file contents simply by addressing the bytes to be accessed or modified directly. However, in the case in which multiple programs have the file open and mapped at the same time, the coordination of access to the file has to be performed by the programs themselves. That is, one program may attempt to write a series of ten bytes to the file by simply storing the ten bytes sequentially while the other program is attempting to read these bytes by simply loading from the bytes sequentially. It is possible that the second program will be scheduled to execute before the first program is finished and will see only half of the modified data in the data that it loads from the file. These problems are solved by cooperation between the programs sharing the file or by avoiding the use of mapped files and confining the operation to the use of the read and write system calls. The read system call and write system call in AIX are designed to operate in a serializable fashion, that is, if both are attempted at the same time by two programs, one executes completely before the other is allowed to execute.
In a distributed environment, processes running on one machine may be accessing files on another machine. It is important under these circumstances to insure that read and write operations to a file are performed in a serializable fashion just as they are in a standalone environment where all processes are running on a single machine. The difficulty occurs because performance needs dictate that files be allowed to be cached or buffered on client machines.
In a distributed environment, as shown in FIG. 1, there are two ways the processing system 10C in local node C could read the file 5A from node A. In one way, the processing system 10C could copy the whole file 5A, and then read it as if it were a local file 5C residing at node C. Reading a file in this way creates a problem if another processing system 10A at another node A modifies the file 5A after the file 5A has been copied at node C as file 5C. The processing system 10C would not have access to these latest modifications to the file 5A.
Another way for processing system 10C to access a file 5A at node A is to read one block, e.g. N1 at a time as the processing system at node C requires it. A problem with this method is that every read has to go across the network communication link 3 to the node A where the file resides. Sending the data for every successive read is time consuming.
Thus, accessing files across a network presents the two competing problems illustrated above. One problem involves the time required to transmit data across the network for successive reads and writes. On the other hand, if the file data is stored in the node to reduce network traffic, the file integrity may be lost. For example, if one of the several nodes is also writing to the file, the other nodes accessing the file may not be accessing the latest updated data that has just been written. As such, the file integrity is lost since a node may be accessing incorrect and outdated files.
Summarizing, in a distributed data processing system, data can be accessed by a plurality of nodes. The data may be controlled by one node within this data processing system known as the server. The other nodes that access this data are known as the clients. Clients gain access to the data by sending a request to the server. The server returns data to the clients that requested access to the data. The client may then read and additionally, in some instances, modify the requested data. It would, therefore, be of great benefit to the users of such systems for a server to be able to provide identical data to multiple users, while being able to assure each user that the data being processed by that user remains valid, reflecting the latest changes made by all users.