1. Field of the Invention
This invention relates to a plurality of data processing systems connected by a communications link, and more particularly to the accessing of files between local and remote processing systems in a distributed networking environment.
2. Description of the Related Art
As shown in FIG. 1, a distributed networking environment 1 consists of two or more nodes A, B, C, connected through a communication link or a network 3. The network 3 can be either a local area network (LAN), or a wide area network (WAN).
At any of the nodes A, B, C, there may be a processing system 10A, 10B, 10C, such as a workstation. Each of these processing systems 10A, 10B, 10C, may be a single user system or a multi-user system with the ability to use the network 3 to access files located at a remote node. For example, the processing system 10A at local node A, is able to access the files 5B, 5C at the remote nodes B, C, respectively.
Within this document, the term "server" will be used to indicate the processing system where the file is permanently stored, and the term "client" will be used to mean any other processing system having processes accessing the file. It is to be understood, however, that the term "server" does not mean a dedicated server as that term is used in some local area network systems. The distributed services system in which the invention is implemented is truly a distributed system supporting a wide variety of applications running at different nodes in the system which may access files located anywhere in the system.
As mentioned, the invention to be described hereinafter is directed to a distributed data processing system in a communication network. In this environment, each processor at a node in the network potentially may access all the files in the network no matter at which nodes the files may reside.
Other approaches to supporting a distributed data processing system are known. For example, IBM's Distributed Services for the AIX operating system is disclosed in Ser. No. 014,897 "A System and Method for Accessing Remote Files in a Distributed Networking Environment ", filed Feb. 13, 1987 in the name of Johnson et al. In addition, Sun Microsystems has released a Network File System (NFS) and Bell Laboratories has developed a Remote File System (RFS). The Sun Microsystems NFS has been described in a series of publications including S.R. Kleiman, "Vnodes: An Architecture for Multiple File System Types in Sun UNIX", Conference Proceedings, USENIX 1986 Summer Technical Conference and Exhibition, pp. 238 to 247; Russel Sandberg et al., "Design and Implementation of the Sun Network Filesystem", Conference Proceedings, Usenix 1985, pp. 119 to 130; Dan Walsh et al., "Overview of the Sun Network File System", pp. 117 to 124; JoMei Chang, "Status Monitor Provides Network Locking Service for NFS", JoMei Chang, "SunNet", pp. 71 to 75; and Bradley Taylor, "Secure Networking in the Sun Environment", pp. 28 to 36. The AT&T RFS has also been described in a series of publications including Andrew P. Rifkin et al., "RFS Architectural Overview", USENIX Conference Proceedings, Atlanta, Ga. (June 1986), pp. 1 to 12; Richard Hamilton et al., "An Administrator's View of Remote File Sharing", pp. 1 to 9; Tom Houghton et al., "File Systems Switch", pp. 1 to 2; and David J. Olander et al., "A Framework for Networking in System V", pp. 1 to 8.
One feature of the distributed services system in which the subject invention is implemented which distinguishes it from the Sun Microsystems NFS, for example, is that Sun's approach was to design what is essentially a stateless server. This means that the server does not store any information about client nodes, including such information as which client nodes have a server file open or whether client processes have a file open in read.sub.-- only or read.sub.-- write modes. Such an implementation simplifies the design of the server because the server does not have to deal with error recovery situations which may arise when a client fails or goes off-line without properly informing the server that it is releasing its claim on server resources.
An entirely different approach was taken in the design of the distributed services system in which the present invention is implemented. More specifically, the distributed services system may be characterized as a "stateful implementation". A "stateful" server, such as that described here, does keep information about who is using its files and how the files are being used. This requires that the server have some way to detect the loss of contact with a client so that accumulated state information about that client can be discarded. The cache management strategies described here cannot be implemented unless the server keeps such state information.
The problems encountered in accessing remote nodes can be better understood by first examining how a stand-alone system accesses files. In a stand alone system, such as 10 as shown in FIG. 2, a local buffer 12 in the operating system 11 is used to buffer the data transferred between the permanent storage 2, such as a hard file or a disk in a workstation, and the user address space 14. The local buffer 12 in the operating system 11 is also referred to as a local cache or kernel buffer.
In the stand-alone system, the kernel buffer 12 is divided into blocks 15 which are identified by device number, and logical block number within the device. When a read system call 16 is issued, it is issued with a file descriptor of the file 5 for a byte range within the file 5, as shown in step 101, FIG. 3. The operating system 11 takes this information and converts it to device number, and logical block numbers in the device, step 102, FIG. 3. If the block is in the cache, step 103, the data is obtained directly from the cache, step 105. In the case where the cache doesn't hold the sought for block at step 103, the data is read into the cache in step 104 before proceeding with step 105 where the data is obtained from the cache.
Any data read from the disk 2 is kept in the cache block 15 until the cache block 15 is needed for some other purpose. Consequently, any successive read requests from an application 4 that is running on the processing system 10 for the same data previously read is accessed from the cache 12 and not the disk 2. Reading from the cache is far less time consuming than reading from the disk.
Similarly, data written from the application 4 is not saved immediately on the disk 2, but is written to the cache 12. This saves disk accesses if another write operation is issued to the same block. Modified data blocks in the cache 12 are saved on the disk 2 periodically.
Use of a cache in a stand-alone system that utilizes an AIX operating system improves the overall performance of the system since disk accessing is eliminated for successive reads and writes. Overall performance is enhanced because accessing permanent storage is slower and more expensive than accessing a cache.
As described above, local buffers in the operating system can be used to improve the performance of stand-alone access to files. These local buffers are kept in fast memory while files are usually kept in slower permanent storage such as disk drives. Larger buffer caches can enhance a data processing system's performance because the cache can hold more of the data belonging to the system's files and hence will reduce the need to use the slower disk drives. A system's physical fast memory is of limited size. Rather than partitioning physical memory by setting aside a fixed fraction for the operating system's kernel buffers, virtual memory techniques can be used to speed up the access to system's disk files. In this virtual memory technique, there is no fixed cache of disk blocks. Instead, data is cached in virtual not physical memory.
Virtual memory provides memory space larger than the available physical memory. This virtual memory space is divided into pages and used by programs as if the virtual memory space was true physical memory. A system's virtual memory pages reside in either actual physical memory frames, disk blocks, or both. Whenever a virtual memory page is not present in a physical frame, any attempt to use that page will result in a exception known as a page fault. The program attempting to use such a page generates a page fault and is temporarily suspended while the virtual memory page is retrieved from the disk block where it currently resides and is copied into a physical memory frame. After the virtual memory page has been assigned a physical frame, the original faulting program can be allowed to continue and it will now find that the data in that virtual memory page is available.
Another way to take advantage of the flexibility provided by virtual memory is to allow processes to map files into their virtual address space. In this way, a process can access the contents of a file without executing a read or write system call. The reading and writing of a file is performed, but it is performed by the virtual memory manager in response to the processes' loads and stores executed against those addresses within its address space where the file has been mapped into. As an example, a short file of 100 bytes might be mapped into a range of addresses from 4,800-4,899. When the process loads a byte from location 4,805, the process would be obtaining the byte at offset 5 within the file. When the process stored to location 4,800, the process would be changing the contents of the byte at offset 0, i.e. the first byte of the file. This allows a process to access and modify the contents of the file without any read or write system calls.
As illustrated in FIG. 2B, a store to a memory segment 91 which contains a mapped file 92 might extend the file 92, but file system logic 93 is not invoked for each store. Therefore, if an application 94 maps a file and extends the file with stores, the file size attribute in the inode data structure 95 is not synchronously updated. At any particular instant the file system's opinion of the file's size (the size stored in the inode) may not be up to date with the most recent virtual memory stores. The inode 95, which is a data structure containing information for the file 92, is brought up to date by system calls such as sync, fsync, close, and by periodic sync operations performed by the operating system 96. When a file is modified by traditional system calls (e.g. write), the file system logic 93 is invoked and it uses mapped file stores to update the file's data. The file system logic 93 knows, via the system call's parameters, what is being done to the file and thus updates the file size found in the inode data structure 95 synchronously. The stat system call returns the current inode file size value; it does not query the virtual memory manager 97. The result is that if a file 92 is operated on exclusively by system calls, then the file size found in the inode data structure 95 and returned by stat is always up to date. However, if the file is operated on by mapped stores, then the file size returned by stat may not reflect the results of the most recent stores. Applications 94 which use mapped access (rather than system calls) may issue fsync if they want to insure that an ensuing stat will reflect their most recent modifications. If an application extends a file 92 with stores (as opposed to system calls) the virtual memory manager 97 knows which page is the "rightmost" page of the file, but it doesn't know which byte within that page holds the last byte of the file. When system calls are used, however, the file system knows, with byte granularity, the size of the file.
In a distributed environment, as shown in FIG. 1, there are two ways the processing system 10C in local node C could read the file 5A from node A. In one way, the processing system 10C could copy the whole file 5A, and then read it as if it were a local file 5C residing at node C. Reading a file in this way creates a problem if another processing system 10A at another node A modifies the file 5A after the file 5A has been copied at node C as file 5C. The processing system 10C would not have access to these latest modifications to the file 5A.
Another way for processing system 10C to access a file 5A at node A is to read one block, e.g. N1, at a time as the processing system at node C requires it. A problem with this method is that every read has to go across the network communication link 3 to the node A where the file resides. Sending the data for every successive read is time consuming.
Accessing files across a network presents two competing problems as illustrated above. One problem involves the time required to transmit data across the network for successive reads and writes. On the other hand, if the file data is stored in the node to reduce network traffic, the file integrity may be lost. For example, if one of the several nodes is also writing to the file, the other nodes accessing the file may not be accessing the latest updated data that has just been written. As such, the file integrity is lost since a node may be accessing incorrect and outdated files.
In addition to the difficulty of managing the data belonging to a file in a distributed environment, there is a problem of managing the attributes of a file that is being accessed in a distributed processing environment. Files have three important attributes that change frequently; the file size, the time of last modification, and the time of last access to the file. Each time a process appends data to the end of a file, the file size changes along with the time of last modification and the time of last access. Each time a file is read by a process, the time of last access changes.
One way to maintain this information accurately is to maintain the information at the file server. Each time a file is accessed or a file size is changed by a client, the client sends a message to the server informing the server of the changes. Each time an attribute is required by a client, the client sends a message to the server requesting the values of the attributes. This solution maintains the correct file attributes, but at too high of a cost of performance requiring a messages to and from the server each time a file is read or written at any client machine.
On the other hand, if the attributes are kept at the client machines, the server and other clients in the distributed environment will not have the correct values of the attributes. Since multiple clients may be accessing the file at the same time, inconsistent values of the attributes may exist at the various clients at the same time.
An additional complication is introduced by allowing processes to map files to their virtual address space. When this is done, a process can manipulate the contents of a file, possible changing its size and the time that it was last accessed or modified, without using a system call such as read or write. This access occurs through load and store instructions. In this type of situation, the operating system has no opportunity to keep track of the time of last access and the other file attributes as it had with system calls. Likewise, in a distributed environment, which allows processes to map remote files into their virtual address space, these complications exist if useful file attributes are to be maintained.