1. Field of the Invention
The present invention pertains to a method and apparatus for asynchronous file writes in a distributed file system and, more particularly, to a method and apparatus for delaying asynchronous file writes in such a system.
2. Description of the Related Art
As information technology has matured, computing systems have evolved into what are now known as “enterprise computing systems.” An enterprise computing system is typically a large number of computing and storage devices, all of which are employed by users from a single concern, or “enterprise.” One popular type of enterprise computing system is an “intranet,” which is a computing system that operates like the Internet, but requires special authorization to access. Such access is typically only granted to employees and/or contractors of the enterprise. However, not all enterprise computing systems are intranets or operate along the principles of the Internet. One of the defining characteristics of the Internet is that communications among the computing devices utilize the Transmission Control Protocol/Internet Protocol (“TCP/IP”) as do intranets. However, there are many protocols, some of them proprietary, that may instead be employed in enterprise computing systems for, among other reasons, security purposes.
One common characteristic of enterprise computing systems is that they employ a “client/server architecture.” A client/server architecture is one in which each computing device or process is either a “client” or a “server.” Servers usually are powerful computing devices or processes dedicated to providing services such as managing disk drives (file servers), printers (print servers), or traffic (general servers). Clients usually are personal computers or workstations on which users run applications. Clients rely on servers for resources, such as files, devices, and even processing power. For instance, if two networked users send a print job to the same printer, they will go to the printer through the server and the server may decide the order in which they are printed. While this example is simplistic, it demonstrates the role of the server. The server also manages the use of processing resources, shared memory, and shared software.
Another common characteristic of enterprise computing systems is that they may be conceptualized as groups, or “clusters,” of constituent computing systems. In an enterprise computing system, the number of users is typically so large that several, sometimes dozens or hundreds, of servers are necessary to manage all the computing resources of the system. These computing resources are grouped into clusters. Each cluster has at least one server that administers the cluster's computing resources. Some enterprise computing systems might also have a “master” server that controls operations across the entire computing system.
Frequently, the system's architects imbue an enterprise computing system with “single system semantics.” This means that, ideally, the network structure is transparent to the user so that the user is completely unaware they are working in any particular system or cluster, or even that the network is grouped into clustered systems. All the users will know is that they are interfaced with a network of computing resources at their disposal.
One feature found in a clustered enterprise computing system is a “distributed file system.” In such a computing system, users typically do not read and/or write directly to long-term, or “disk” storage. In this context, “files” constitute data stored in a predefined format, structure, or model. A file system (“FS”) usually organizes data currently being used or that has been recently used into various files in temporary storage, or “cache.” When a user needs new or more data, the FS provides it from cache or, if the data is not in cache, from disk storage. The FS also decides when to write data from the cache to disk storage. One important quality for a FS is efficient use of storage. It is therefore important for a FS to efficiently organize the cache, retrieve from. disk storage to cache, and to store from cache to disk storage. Note that data is typically manipulated in groups called “pages,” so that reads and writes between cache and disk storage are usually done in pages.
A distributed file system (“DFS”) is simply a FS in which the various files that may be accessed may be shared simultaneously by the other computing resources. Thus, multiple users can use the data at the same time. Files in a DFS may be distributed across the entire computing system. More commonly, however, files are grouped and segregated into the clusters into which the rest of the computing resources are grouped. Such a cluster-wide DFS shall, for present purposes, be referred to as a cluster FS (“CFS”).
Thus, one or more of the computing resources in a cluster will usually be running an “application” at any given time. The application(s) operate(s) on the data in the files of the CFS. The CFS manages the reading and writing of data between the computing resources and the cache and between the cache and the disk storage. Applications may also sometimes reach beyond their cluster into the CFS of another cluster. The grouping of files into a particular CFS is generally predicated on the commonality of their use by application(s) running in a cluster. In a system employing single system semantics, the users are unaware of all this activity. in the computing system that executes the various tasks directed by the user.
Because multiple applications may access the same file, and even the same page in the same file, a computing system employing a DFS dedicates a lot of effort to ensuring data integrity, i.e., that the data is up to date and accurate. Applications frequently retrieve a page and alter the data on the page. This data alteration must be tracked and stored at some point so that further use of the page will involve the “correct” data. The computing system includes a “virtual memory subsystem” (“VMS”) that cooperates with the DFS to track what applications are accessing what pages of what files. The VMS keeps two lists of pages that have been accessed by applications. The first list is of “dirty pages” and the second of “clean pages.”
More particularly, in a typical scenario, an application will request a page from a server, i.e., read the page. The VMS places the page on the clean list. Sometimes, the application alters data on the page. Once the data is altered, the page is considered “dirty” and the VMS deletes it from the clean list and adds it to the dirty list. At some point, the dirty page is written back to the server. This write might result from the application finishing with the dirty page. Alternatively, another application on a different computing system might request the dirty page, whereupon the server will force the client on which the first application resides to flush its dirty page to the server. Either way, the server writes the dirty pages to disk immediately upon receipt. Once the dirty page is written to disk, it is then “clean.” The VMS deletes the page from the dirty list and adds it to the clean list.
However, the dirty page is typically written to disk, i.e., “cleaned,” before the file is closed. This “forced write” is very inefficient because it requires the applications to be put on hold while the write occurs. Furthermore, the write occurs regardless of how many pages need to be written.
In an enterprise computing system, therefore, an inordinate amount of time is spent writing dirty pages to disk for the sole purpose of meeting requests by applications. The inefficiency is tolerated, however, to ensure information integrity in the event the server fails. If the server fails, it will invariably fail in the middle of some operation. When the server is brought back and proceeds through its state recovery, knowing which data is dirty and which is clean is very important. Hence, the importance of the forced write to data integrity. Some alternative approaches have attempted to mitigate these inefficiencies by employing “write behinds” that require the altered data be on disk before the file is closed. However, these attempts have achieved minimal improvements in efficiency relative to the forced writes. These attempts have also not adhered to the strict guidelines for single system semantics with respect to out of space handling or cache consistency or modification time handling.
The present invention is directed to resolving, or at least reducing the effects of, one or all of the problems mentioned above.