The present invention relates generally to the technical field of multi-processor digital computer systems and, more particularly, to multi-processor computer systems in which:
1. the processors are loosely coupled or networked together;
2. data needed by some of the processors is controlled by a different processor that manages the storage of and access to the data;
3. processors needing access to data request such access from the processor that controls the data;
4. the processor controlling data provides requesting processors with access to it.
Within a digital computer system, processing data stored in a memory; e.g., a Random Access Memory (xe2x80x9cRAMxe2x80x9d) or on a storage device such as a floppy disk drive, a hard disk drive, a tape drive, etc.; requires copying the data from one location to another prior to processing. Thus, for example, prior to processing data stored in a file in a comparatively slow speed storage device such as hard disk, the data is first copied from the computer system""s hard disk to its much higher speed RAM. After data has been copied from the hard disk to the RAM, the data is again copied from the RAM to the computer system""s processing unit where it is actually processed. Each of these copies of the data, i.e., the copy of the data stored in the RAM and the copy of the data processed by the processing unit, can be considered to be image of the data stored on the hard disk. Each of these images of the data may be referred to as a projection of the data stored on the hard disk.
In a loosely coupled or networked computer system having several processors that operate autonomously, the data needed by one processor may be accessed only by communications passing through one or more of the other processors in the system. For example, in a Local Area Network (xe2x80x9cLANxe2x80x9d) such as Ethernet one of the processors may be dedicated to operating as a file server that receives data from other processors via the network for storage on its hard disk, and supplies data from its hard disk to the other processors via the network. In such networked computer systems, data may pass through several processors in being transmitted from its source at one processor to the processor requesting it.
In some networked computer systems, images of data are transmitted directly from their source to a requesting processor. One operating characteristic of networked computer systems of this type is that, as the number of requests for access to data increase and/or the amount of data being transmitted in processing each request increases, ultimately the processor controlling access to the data or the data transmission network becomes incapable of responding to requests within an acceptable time interval. Thus, in such networked computer systems, an increasing workload on the processor controlling access to data or on the data transmission network ultimately causes unacceptably long delays between a processor""s request to access data and completion of the requested access.
In an attempt to reduce delays in providing access to data in networked computer systems, there presently exist systems that project an image of data from its source into an intermediate storage location in which the data is more accessible than at the source of the data. The intermediate storage location in such systems is frequently referred to as a xe2x80x9ccache,xe2x80x9d and systems that project images of data into a cache are be referred to as xe2x80x9ccachingxe2x80x9d systems.
An important characteristic of caching systems, frequently referred to as xe2x80x9ccache consistencyxe2x80x9d or xe2x80x9ccache coherency,xe2x80x9d is their ability to simultaneously provide all processors in the networked computer system with identical copies of the data. If several processors concurrently request access to the same data, one processor may be updating the data while another processor is in the process of referring to the data being updated. For example, in commercial transactions occurring on a networked computer system one processor may be accessing data to determine if a customer has exceeded their credit limit while another processor is simultaneously posting a charge against that customer""s account. If a caching system lacks cache consistency, it is possible that one processor""s access to data to determine if the customer has exceeded their credit limit will use a projected image of the customer""s data that has not been updated with the most recent charge. Conversely, in a caching system that possesses complete or absolute cache consistency, the processor that is checking the credit limit is guaranteed that the data it receives incorporates the most recent modifications.
One presently known system that employs data caching is the Berkeley Software Distribution (xe2x80x9cBSDxe2x80x9d) 4.3 version of the Unix timesharing operating system. The BSD 4.3 system includes a buffer cache located in the host computer""s RAM for storing projected images of blocks of data, typically 8 k bytes, from files stored on a hard disk drive. Before a particular item of data may be accessed on a BSD 4.3 system, the requested data must be projected from the hard disk into the buffer cache. However, before the data may be projected from the disk into the buffer cache, space must first be found in the cache to store the projected image. Thus, for data that is not already present in a BSD 4.3 system""s buffer cache, the system must perform the following steps in providing access to the data:
Locate the buffer in the RAM that contains the Least Recently Used (xe2x80x9cLRUxe2x80x9d) block of disk data.
Discard the LRU block of data which may entail writing that block of data back to the hard disk.
Project an image of the requested block of data into the now empty buffer.
Provide the requesting processor with access to the data.
If the data being accessed by a processor is already present in a BSD 4.3 system""s data cache, then responding to a processor""s request for access to data requires only the last operation listed above. Because accessing data stored in RAM is much faster that accessing data stored on a hard disk, a BSD 4.3 system responds to requests for access to data that is present in its buffer cache in approximately {fraction (1/250)}th the time that it takes to respond to a request for access to data that is not already present in the buffer cache.
The consistency of data images projected into the buffer cache in a BSD 4.3 system is excellent. Since the only path from processors requesting access to data on the hard disk is through the BSD 4.3 system""s buffer cache, out of date blocks of data in the buffer cache are always overwritten by their more current counterpart when that block""s data returns from the accessing processor. Thus, in the BSD 4.3 system an image of data in the system""s buffer cache always reflects the true state of the file. When multiple requests contend for the same image, the BSD 4.3 system queues the requests from the various processors and sequences the requests such that each request is completely serviced before any processing commences on the next request. Employing the preceding strategy, the BSD 4.3 system ensures the integrity of data at the level of individual requests for access to segments of file data stored on a hard disk.
Because the BSD 4.3 system provides access to data from its buffer cache, blocks of data on the hard disk frequently do not reflect the true state of the data. That is, in the BSD 4.3 system, frequently the true state of a file exists in the projected image in the system""s buffer cache that has been modified since being projected there from the hard disk, and that has not yet been written back to the hard disk. In the BSD 4.3 system, images of data that are more current than and differ from their source on the hard disk data may persist for very long periods of time, finally being written back to the hard disk just before the image is about to be discarded due to its xe2x80x9cdeathxe2x80x9d by the LRU process. Conversely, other caching systems exist that maintain data stored on the hard disk current with its image projected into a data cache. Network File System (xe2x80x9cNFS(copyright)xe2x80x9d) is one such caching system.
In many ways, NFS""s client cache resembles the BSD 4.3 systems buffer cache. In NFS, each client processor that is connected to a network may include its own cache for storing blocks of data. Furthermore, similar to BSD 4.3, NFS uses the LRU algorithm for selecting the location in the client""s cache that receives data from an NFS server across the network, such as Ethernet. However, perhaps one of NFS""s most significant differences is that images of blocks of data are not retrieved into NFS""s client cache from a hard disk attached directly to the processor as in the BSD 4.3 system. Rather, in NFS images of blocks of data come to NFS""s client cache from a file server connected to a network such as Ethernet.
The NFS client cache services requests from a computer program executed by the client processor using the same general procedures described above for the BSD 4.3 system""s buffer cache. If the requested data is already projected into the NFS client cache, it will be accessed almost instantaneously. If requested data is not currently projected into NFS""s client cache, the LRU algorithm must be used to determine the block of data to be replaced, and that block of data must be discarded before the requested data can be projected over the network from the file server into the recently vacated buffer.
In the NFS system, accessing data that is not present in its client cache takes approximately 500 times longer than accessing data that is present there. About one-half of this delay is due to the processing required for transmitting the data over the network from an NFS file server to the NFS client cache. The remainder of the delay is the time required by the file server to access the data on its hard disk and to transfer the data from the hard disk into the file server""s RAM.
In an attempt to reduce this delay, client processors read ahead to increase the probability that needed data will have already been projected over the network from the file server into the NFS client cache. When NFS detects that a client processor is accessing a file sequentially, blocks of data are asynchronously pre-fetched in an attempt to have them present in the NFS client cache when the client processor requests access to the data. Furthermore, NFS employs an asynchronous write behind mechanism to transmit all modified data images present in the client cache back to the file server without delaying the client processor""s access to data in the NFS client cache until NFS receives confirmation from the file server that it has successfully received the data. Both the read ahead and the write behind mechanisms described above contribute significantly to NFS""s reasonably good performance. Also contributing to NFS""s good performance is its use of a cache for directories of files present on the file server, and a cache for attributes of files present on the file server.
Several features of NFS reduce the consistency of its projected images of data. For example, images of file data present in client caches are re-validated every 3 seconds. If an image of a block of data about to be accessed by a client is more than 3 seconds old, NFS contacts the file server to determine if the file has been modified since the file server originally projected the image of this block of data. If the file has been modified since the image was originally projected, the image of this block in the NFS client cache and all other projected images of blocks of data from the same file are removed from the client cache. When this occurs, the buffers in RAM thus freed are queued at the beginning of a list of buffers (the LRU list) that are available for storing the next data projected from the file server. The images of blocks of data discarded after a file modification are re-projected into NFS""s client cache only if the client processor subsequently accesses them.
If a client processor modifies a block of image data present in the NFS client cache, to update the file on the file server NFS asynchronously transmits the modified data image back to the server. Only when another client processor subsequently attempts to access a block of data in that file will its cache detect that the file has been modified.
Thus, NFS provides client processors with data images of poor consistency at reasonably good performance. However, NFS works only for those network applications in which client processors don""t share data or, if they do share data, they do so under the control of a file locking mechanism that is external to NFS. There are many classes of computer application programs that execute quite well if they access files directly using the Unix File System that cannot use NFS because of the degraded images projected by NFS.
Another limitation imposed by NFS is the relatively small size (8 k bytes) of data that can be transferred in a single request. Because of this small transfer size, processes executing on a client processor must continually request additional data as they process a file. The client cache, which typically occupies only a few megabytes of RAM in each client processor, at best, reduces this workload to some degree. However, the NFS client cache cannot mask NFS""s fundamental character that employs constant, frequent communication between a file server and all of the client processors connected to the network. This need for frequent server/client communication severely limits the scalability of an NFS network, i.e., severely limits the number of processors that may be networked together in a single system.
Andrew File System (xe2x80x9cAFSxe2x80x9d) is a data caching system that has been specifically designed to provide very good scalability. Now used at many universities, AFS has demonstrated that a few file servers can support thousands of client workstations distributed over a very large geographic area. The major characteristics of AFS that permit its scalability are:
The unit of cached data increases from NFS""s 8 k disk block to an entire file. AFS projects complete files from the file server into the client workstations.
Local hard disk drives, required on all AFS client workstations, hold projected file images. Since AFS projects images of complete files, its RAM is quickly occupied by image projections. Therefore, AFS projects complete files onto a client""s local hard disk, where they can be locally accessed many times without requiring any more accesses to the network or to the file server.
In addition to projecting file images onto a workstation""s hard disk, similar to BSD 4.3, AFS also employs a buffer cache located in the workstation""s RAM to store images of blocks of data projected from the file image stored on the workstation""s hard disk.
Under AFS, when a program executing on the workstation opens a file, a new file image is projected into the workstation from the file server only if the file is not already present on the workstation""s hard disk, or if the file on the file server supersedes the image stored on the workstation""s hard disk. Thus, assuming that an image of a file has previously been projected from a network""s file server into a workstation, a computer program""s request to open that file requires, at a minimum, that the workstation transmit at least one message back to the server to confirm that the image currently stored on its hard disk is the most recent version. This re-validation of a projected image requires a minimum of 25 milliseconds for files that haven""t been superseded. If the image of a file stored on the workstation""s hard disk has been superseded, then it must be re-projected from the file server into the workstation, a process that may require several seconds. After the file image has been re-validated or re-projected, programs executed by the workstation access it via AFS""s local file system and its buffer cache with response comparable to those described above for BSD 4.3.
The consistency of file images projected by AFS start out as being xe2x80x9cexcellentxe2x80x9d for a brief moment, and then steadily degrades over time. File images are always current immediately after the image has been projected from the file server into the client processor, or re-validated by the file server. However, several clients may receive the same file projection at roughly the same time, and then each client may independently begin modifying the file. Each client remains completely unaware of any modifications being made to the file by other clients. As the computer program executed by each client processor closes the file, if the file has been modified the image stored on the processor""s hard disk is transmitted back to the server. Each successive transmission from a client back to the file server overwrites the immediately preceding transmission. The version of the file transmitted from the final client processor to the file server is the version that the server will subsequently transmit to client workstations when they attempt to open the file. Thus at the conclusion of such a process the file stored on the file server incorporates only those modifications made by the final workstation to transmit the file, and all modifications made at the other workstations have been lost. While the AFS file server can detect when one workstation""s modifications to a file overwrites modifications made to the file by another workstation, there is little the server can do at this point to prevent this loss of data integrity.
AFS, like NFS, fails to project images with absolute consistency. If computer programs don""t employ a file locking mechanism external to AFS, the system can support only applications that don""t write to shared files. This characteristic of AFS precludes using it for any application that demands high integrity for data written to shared files.
An object of the present invention is to provide a digital computer system capable of projecting larger data images, over greater distances, at higher bandwidths, and with much better consistency than the existing data caching mechanisms.
Another object of the present invention is to provide a generalized data caching mechanism capable of projecting multiple images of a data structure from its source into sites that are widely distributed across a network.
Another object of the invention is to provide a generalized data caching mechanism in which an image of data always reflects the current state of the source data structure, even when it is being modified concurrently at several remote sites.
Another object of the present invention is to provide a generalized data caching mechanism in which a client process may operate directly upon a projected image as though the image were actually the source data structure.
Another object of the present invention is to provide a generalized data caching mechanism that extends the domain over which data can be transparently shared.
Another object of the present invention is to provide a generalized data caching mechanism that reduces delays in responding to requests for access to data by projecting images of data that may be directly processed by a client site into sites that are xe2x80x9ccloserxe2x80x9d to the requesting client site.
Another object of the present invention is to provide a generalized data caching mechanism that transports data from its source into the projection site(s) efficiently.
Another object of the present invention is to provide a generalized data caching mechanism that anticipates future requests from clients and, when appropriate, projects data toward the client in anticipation of the client""s request to access data.
Another object of the present invention is to provide a generalized data caching mechanism that maintains the projected image over an extended period of time so that requests by a client can be repeatedly serviced from the initial projection of data.
Another object of the present invention is to provide a generalized data caching mechanism that employs an efficient consistency mechanism to guarantee absolute consistency between a source of data and all projected images of the data.
Briefly the present invention in its preferred embodiment includes a plurality of digital computers operating as a network. Some of the computers in the network function as Network Distributed Cache (xe2x80x9cNDCxe2x80x9d) sites. Operating in the digital computer at each NDC site is an NDC that includes NDC buffers. The network of digital computers also includes one or more client sites, which may or may not be NDC sites. Each client site presents requests to an NDC to access data that is stored at an NDC site located somewhere within the network. Each item of data that may be requested by the client sites belongs to a named set of data called a dataset. The NDC site storing a particular dataset is called the NDC server terminator site for that particular dataset. The NDC site that receives requests to access data from the client site is called the NDC client terminator site. A single client site may concurrently request to access different datasets that are respectively stored at different NDC sites. Thus, while there is only a single NDC client terminator site for each client site, simultaneously there may be a plurality of NDC server terminator sites responding to requests from a single client site to access datasets stored at different NDC server terminator sites.
Each NDC in the network of digital computers receives requests to access the data in the named datasets. If this NDC site is an NDC client terminator site for a particular client site, it will receive requests from that client. However, the same NDC site that is an NDC client terminator site for one client, may also receive requests to access data from other NDC sites that may or may not be NDC client terminator sites for other client sites.
An NDC client terminator site, upon receiving the first request to access a particular named dataset from a client site, assigns a data structure called a channel to the request and stores information about the request into the channel. Each channel functions as a conduit through the NDC site for projecting images of data to sites requesting access to the dataset, or, if this NDC site is an NDC client terminator site for a particular request, the channel may store an image of the data in the NDC buffers at this NDC site. In addition to functioning as part of a conduit for transmitting data between an NDC server terminator site and an NDC client terminator site, each channel also stores data that provides a history of access patterns for each client site as well as performance measurements both for client sites and for the NDC server terminator site.
When an NDC site receives a request to access data, regardless of whether the request is from a client site or from another NDC site, the NDC first checks the NDC buffers at this NDC site to determine if a projected image of the requested data is already present in the NDC buffers. If the NDC buffers at this NDC site do not contain a projected image of all data requested from the dataset, and if the NDC site receiving the request is not the NDC server terminator site for the dataset, the NDC of this NDC site transmits a single request for all of the requested data that is not present at this NDC site from this NDC site downstream to another NDC site closer to the NDC server terminator site for the dataset than the present NDC site. If the NDC buffers of this NDC site do not contain a projected image of all data requested from the dataset, and if the NDC site receiving the request is the sever terminator site for the dataset, the NDC of this NDC site accesses the stored dataset to project an image of the requested data into its NDC buffers. The process of checking the NDC buffers to determine if a projected image of the requested data is present there, and if one is not completely present, requesting the additional required data from a downstream NDC site or accessing the stored dataset repeats until the NDC buffers of the site receiving the request contains a projected image of all requested data.
The process of one NDC site requesting data from another downstream NDC site establishes a chain of channels respectively located in each of the NDC sites that provides a conduit for returning the requested data back to the NDC client terminator site. Thus, each successive NDC site in this chain of NDC sites, having obtained a projected image of all the requested data, either by accessing the stored dataset or from its downstream NDC site, returns the data requested from it upstream to the NDC site from which it received the request. This sequence of data returns from one NDC site to its upstream NDC site continues until the requested data arrives at the NDC client terminator site. When the requested data reaches the NDC client terminator site for this request, that NDC site returns the requested data to the client site.
Thus, the network of digital computers, through the NDCs operating in each of the NDC sites in the network, may project images of a stored dataset from an NDC server terminator site to a plurality of client sites in response to requests to access such dataset transmitted from the client sites to NDC client terminator sites. Furthermore, each NDC includes routines called channel daemons that operate in the background in each NDC site. The channel daemons use historical data about accesses to the datasets, that the NDCs store in the channels, to pre-fetch data from the NDC server terminator site to the NDC client terminator site in an attempt to minimize any delay between the receipt of a request to access data from the client site and the response to that request by the NDC client terminator site.
In addition to projecting images of a stored dataset, the NDCs detect a condition for a dataset, called a concurrent write sharing (xe2x80x9cCWSxe2x80x9d) condition, whenever two or more client sites concurrently access a dataset, and one or more of the client sites attempts to write the dataset. If a CWS condition occurs, one of the NDC sites declares itself to be a consistency control site (xe2x80x9cCCSxe2x80x9d) for the dataset, and imposes restrictions on the operation of other NDC sites upstream from the CCS. The operating restrictions that the CCS imposes upon the upstream NDC sites guarantee client sites throughout the network of digital computers the same level of file consistency the client sites would have if all the client sites operated on the same computer. That is, the operating conditions that the CCS imposes ensure that modifications made to a dataset by one client site are reflected in the subsequent images of that dataset projected to other client sites no matter how far the client site modifying the dataset is from the client site that subsequently requests to access the dataset.