File systems manage files and other data objects stored on computer systems. File systems were originally built into a computer's operating system to facilitate access to files stored locally on resident storage media. As computers became networked, some file storage capabilities were offloaded from individual user machines to special storage servers that stored large numbers of files on behalf of the user machines. When a file was needed, the user machine simply requested the file from the server. In this server-based architecture, the file system was extended to facilitate management of and access to files stored remotely at the storage server over a network.
One problem that arises in distributed file systems concerns storage of identical files on the server. While some file duplication normally occurs on an individual user's personal computer, duplication unfortunately tends to be quite prevalent on networks where a server centrally stores the contents of multiple personal computers. For example, with a remote boot facility on a computer network, each user boots from that user's private directory on a file server. Each private directory thus ordinarily includes a number of files that are identical to files on other users' directories. Storing the private directories on traditional file systems consumes a great amount of disk and server file buffer cache space. From a storage management perspective, it is desirable to minimize file duplication to reduce the amount of wasted storage space used to store redundant files. However, any such efforts need to be reconciled with the file system that tracks the multiple duplicated files on behalf of the associated users.
To address the problems associated with storing multiple identical files on a computer, Microsoft developed a single instance store (SIS) system that is packaged as part of the Windows 2000 operating system. The SIS system reduces file duplication by automatically identifying common identical files of a file system, and then merging the files into a single instance of the data. One or more logically separate links are then attached to the single instance to represent the original files to the user machines. In this way, the storage impact of duplicate files on a computer system is greatly reduced.
Today, file storage is migrating toward a model in which files are stored on various networked computers, rather than on central storage server. The serverless architecture poses new challenges to file systems. One particular challenge concerns managing files that are distributed over many different computers in a manner that allows a user to quickly access a file, verify that it is indeed the requested file, and read/write that file, all while insuring that the files are stored and accessed in a secure way that prevents access by non-authorized users.
The invention addresses these challenges and provides solutions that are effective for distributed file systems, and in particular, serverless distributed file systems.