1. Field of the Invention
The present invention pertains to a method and apparatus for asynchronous file writes in a distributed file system and, more particularly, to a method and apparatus for delaying asynchronous file writes in such a system.
2. Description of the Related Art
As information technology has matured, computing systems have evolved into what are now known as xe2x80x9centerprise computing systems.xe2x80x9d An enterprise computing system is typically a large number of computing and storage devices, all of which are employed by users from a single concern, or xe2x80x9centerprise.xe2x80x9d One popular type of enterprise computing system is an xe2x80x9cintranet,xe2x80x9d which is a computing system that operates like the Internet, but requires special authorization to access. Such access is typically only granted to employees and/or contractors of the enterprise. However, not all enterprise computing systems are intranets or operate along the principles of the Internet. One of the defining characteristics of the Internet is that communications among the computing devices utilize the Transmission Control Protocol/Internet Protocol (xe2x80x9cTCP/IPxe2x80x9d) as do intranets. However, there are many protocols, some of them proprietary, that may instead be employed in enterprise computing systems for, among other reasons, security purposes.
One common characteristic of enterprise computing systems is that they employ a xe2x80x9cclient/server architecture.xe2x80x9d A client/server architecture is one in which each computing device or process is either a xe2x80x9cclientxe2x80x9d or a xe2x80x9cserver.xe2x80x9d Servers usually are powerful computing devices or processes dedicated to providing services such as managing disk drives (file servers), printers (print servers), or traffic (general servers). Clients usually are personal computers or workstations on which users run applications. Clients rely on servers for resources, such as files, devices, and even processing power. For instance, if two networked users send a print job to the same printer, they will go to the printer through the server and the server may decide the order in which they are printed. While this example is simplistic, it demonstrates the role of the server. The server also manages the use of processing resources, shared memory, and shared software.
Another common characteristic of enterprise computing systems is that they may be conceptualized as groups, or xe2x80x9cclusters,xe2x80x9d of constituent computing systems. In an enterprise computing system, the number of users is typically so large that several, sometimes dozens or hundreds, of servers are necessary to manage all the computing resources of the system. These computing resources are grouped into clusters. Each cluster has at least one server that administers the cluster""s computing resources. Some enterprise computing systems might also have a xe2x80x9cmasterxe2x80x9d server that controls operations across the entire computing system.
Frequently, the system""s architects imbue an enterprise computing system with xe2x80x9csingle system semantics.xe2x80x9d This means that, ideally, the network structure is transparent to the user so that the user is completely unaware they are working in any particular system or cluster, or even that the network is grouped into clustered systems. All the users will know is that they are interfaced with a network of computing resources at their disposal.
One feature found in a clustered enterprise computing system is a xe2x80x9cdistributed file system.xe2x80x9d In such a computing system, users typically do not read and/or write directly to long-term, or xe2x80x9cdiskxe2x80x9d storage. In this context, xe2x80x9cfilesxe2x80x9d constitute data stored in a predefined format, structure, or model. A file system (xe2x80x9cFSxe2x80x9d) usually organizes data currently being used or that has been recently used into various files in temporary storage, or xe2x80x9ccache.xe2x80x9d When a user needs new or more data, the FS provides it from cache or, if the data is not in cache, from disk storage. The FS also decides when to write data from the cache to disk storage. One important quality for a FS is efficient use of storage. It is therefore important for a FS to efficiently organize the cache, retrieve from disk storage to cache, and to store from cache to disk storage. Note that data is typically manipulated in groups called xe2x80x9cpages,xe2x80x9d so that reads and writes between cache and disk storage are usually done in pages.
A distributed file system (xe2x80x9cDFSxe2x80x9d) is simply a FS in which the various files that may be accessed may be shared simultaneously by the other computing resources. Thus, multiple users can use the data at the same time. Files in a DFS may be distributed across the entire computing system. More commonly, however, files are grouped and segregated into the clusters into which the rest of the computing resources are grouped. Such a cluster-wide DFS shall, for present purposes, be referred to as a cluster FS (xe2x80x9cCFSxe2x80x9d).
Thus, one or more of the computing resources in a cluster will usually be running an xe2x80x9capplicationxe2x80x9d at any given time. The application(s) operate(s) on the data in the files of the CFS. The CFS manages the reading and writing of data between the computing resources and the cache and between the cache and the disk storage. Applications may also sometimes reach beyond their cluster into the CFS of another cluster. The grouping of files into a particular CFS is generally predicated on the commonality of their use by application(s) running in a cluster. In a system employing single system semantics, the users are unaware of all this activity in the computing system that executes the various tasks directed by the user.
Because multiple applications may access the same file, and even the same page in the same file, a computing system employing a DFS dedicates a lot of effort to ensuring data integrity, i.e., that the data is up to date and accurate. Applications frequently retrieve a page and alter the data on the page. This data alteration must be tracked and stored at some point so that further use of the page will involve the xe2x80x9ccorrectxe2x80x9d data. The computing system includes a xe2x80x9cvirtual memory subsystemxe2x80x9d (xe2x80x9cVMSxe2x80x9d) that cooperates with the DFS to track what applications are accessing what pages of what files. The VMS keeps two lists of pages that have been accessed by applications. The first list is of xe2x80x9cdirty pagesxe2x80x9d and the second of xe2x80x9cclean pages.xe2x80x9d
More particularly, in a typical scenario, an application will request a page from a server, i.e., read the page. The VMS places the page on the clean list. Sometimes, the application alters data on the page. Once the data is altered, the page is considered xe2x80x9cdirtyxe2x80x9d and the VMS deletes it from the clean list and adds it to the dirty list. At some point, the dirty page is written back to the server. This write might result from the application finishing with the dirty page. Alternatively, another application on a different computing system might request the dirty page, whereupon the server will force the client on which the first application resides to flush its dirty page to the server. Either way, the server writes the dirty pages to disk immediately upon receipt. Once the dirty page is written to disk, it is then xe2x80x9cclean.xe2x80x9d The VMS deletes the page from the dirty list and adds it to the clean list.
However, the dirty page is typically written to disk, i.e., xe2x80x9ccleaned,xe2x80x9d before the file is closed. This xe2x80x9cforced writexe2x80x9d is very inefficient because it requires the applications to be put on hold while the write occurs. Furthermore, the write occurs regardless of how many pages need to be written.
In an enterprise computing system, therefore, an inordinate amount of time is spent writing dirty pages to disk for the sole purpose of meeting requests by applications. The inefficiency is tolerated, however, to ensure information integrity in the event the server fails. If the server fails, it will invariably fail in the middle of some operation. When the server is brought back and proceeds through its state recovery, knowing which data is dirty and which is clean is very important. Hence, the importance of the forced write to data integrity. Some alternative approaches have attempted to mitigate these inefficiencies by employing xe2x80x9cwrite behindsxe2x80x9d that require the altered data be on disk before the file is closed. However, these attempts have achieved minimal improvements in efficiency relative to the forced writes. These attempts have also not adhered to the strict guidelines for single system semantics with respect to out of space handling or cache consistency or modification time handling.
The present invention is directed to resolving, or at least reducing the effects of, one or all of the problems mentioned above.
The invention includes a method and apparatus for delaying asynchronous writes in a distributed file system, wherein the file system includes a unique identifier (xe2x80x9cUIDxe2x80x9d). The method comprises buffering a page of dirty data with the unique identifier upon writing to the server; changing the unique identifier to create a current unique identifier upon a failure of the server; comparing the buffered unique identifier with the current unique identifier when the page is requested while the page is in a written state; and handling the request responsive to the comparison. In other aspects of the invention, the invention comprises a computer programmed to perform the method and a program storage medium encoded with instructions that, when executed by a computer, perform the method.