Today's computers require memory to hold or store both the steps or instructions of computer programs and the data that those programs take as input or produce as output. This memory is conventionally divided into two types, primary storage and secondary storage. Primary storage is that which is immediately accessible by the computer or microprocessor, and is typically though not exclusively used as temporary storage. It is, in effect, the short term memory of the computer. Secondary storage is the long-term computer memory. This form of memory maintains information that must be kept for a long time, and may be orders of magnitude larger and slower. Secondary memory is typically provided by devices such as magnetic disk drives, optical drives, and so forth. These devices present to the computer's operating system a low-level interface in which individual storage subunits may be individually addressed. These subunits are often generalized by the computer's operating system into “blocks,” and such devices are often referred to as “block storage devices.”
Block storage devices are not typically accessed directly by users or (most) programs. Rather, programs or other components of the operating system organize block storage in an abstract fashion and make this higher-level interface available to other software components. The most common higher-level abstraction thus provided is a “filesystem.” In a filesystem, the storage resource is organized into directories, files, and other objects. Associated with each file, directory, or other object is typically a name, some explicit/static metadata such as its owner, size, and so on, its contents or data, and an arbitrary and open set of implicit or “dynamic” metadata such as the file's content type, checksum, and so on. Directories are containers that provide a mapping from directory-unique names to other directories and files. Files are containers for arbitrary data. Because directories may contain other directories, the filesystem client (human user, software application, etc.) perceives the storage to be organized into a quasi-hierarchical structure or “tree” of directories and files. This structure may be navigated by providing the unique names necessary to identify a directory inside another directory at each traversed level of the structure. Hence, the organizational structure of names is sometimes said to constitute a “filesystem namespace.”
Conventional filesystems support a finite set of operations (such as create, open, read, write, close, delete) on each of the abstract objects which the filesystem contains. For each of these operations, the filesystem takes a particular action in accordance with the operation in question and the data provided in the operation. The sequence of these operations over time affects changes to the filesystem structure, data, and metadata in a predictable way. The set of filesystem abstractions, operations, and predictable results for particular actions is said to constitute a “semantic” for the filesystem.
In some cases, a storage resource is accessed by a computer over a network connection. Various mechanisms exist for allowing software or users on one computing device to access storage devices that are located on another remote computer or device. While there are several remote storage access facilities available, they generally fall into one of two classes: block-level; and file-level. File-level remote storage access mechanisms extend the filesystem interface and namespace across the network, enabling clients to access and utilize the files and directories as if they were local. Such systems are therefore typically called “network file system.” One Example of this type of storage access mechanism is the Network File System (“NFS”) originally developed by Sun Microsystems. Note that the term “network file system” is used herein generally to refer to all such systems and the term “NFS” will be used when discussing the Sun Microsystems developed Network File System.
Networked file systems enable machines to access the filesystems that reside on other machines. Architecturally, this leads to the following distinctions. In the context of a given filesystem, one machine plays the role of a filesystem “origin server” (alternatively either “fileserver” or simply “server”) and another plays the role of a filesystem client. The two are connected via a data transmission network. The client and server communicate over this network using standardized network protocols. The high-level protocols which extend the filesystem namespace and abstractions across the network are referred to as “network filesystem protocols.” There are many such protocols, including the Common Internet File System or CIFS, the aforementioned NFS, Novell's Netware filesharing system, Apple's Appleshare, the Andrew File System (AFS), the Coda Filesystem (Coda), and others. CFS and NFS are by far the most prevalent. All of these network filesystem protocols share approximately equivalent semantics and sets of abstractions, but differ in their details and are noninteroperable. In order to use a filesystem from some fileserver, a client must “speak the same language,” i.e., have software that implements the same protocol that the server uses.
A fileserver indicates which portions of its filesystems are available to remote clients by defining “exports” or “shares.” In order to access a particular remote fileserver's filesystems, a client must then make those exports or shares of interest available by including them by reference as part of their own filesystem namespace. This process is referred to as “mounting” or “mapping (to)” a remote export or share. By mounting or mapping, a client establishes a tightly coupled relationship with the particular file server. The overall architecture can be characterized as a “two-tier” client-server system, since the client communicates directly with the server which has the resources of interest to the client.
Current network file system architectures suffer several shortcomings. In large network settings (e.g., those with large numbers of clients and servers), the architecture itself creates administrative problems for the management and maintenance of filesystems. The inflexibility of the two-tier architecture manifests itself in two distinct ways. First, the tight logical coupling of client and server means that changes to the servers (e.g., moving a directory and its [recursive] contents from one server to another) require changes (e.g. to the definitions of mounts or mappings) on all clients that access that particular resource, and thus must be coordinated and executed with care. This is a manual and error-prone process that must be continuously engaged and monitored by the system administrators that manage and maintain such networked filesystems. Second, the overall complexity of the environment grows at a non-linear rate. The complexity of a system of networked filesystem clients and servers can be characterized by the total number of relationships (mounts, mappings) between clients and servers, i.e. it grows as/is bounded by:{{{Complexity˜=#Clients×#Servers}}}
Two-tier networked filesystems therefore ultimately fail to scale in an important sense—the overall cost of managing a networked filesystem environment is proportional to this complexity, and as the complexity grows the costs quickly become untenable. This can be referred to as “the mapping problem.” The mapping problem may be understood as the direct result of an architectural deficiency in networked filesystem, namely the inflexibility of the two-tier architecture.
Existing attempts to address the problems of unconstrained complexity growth in the networked filesystem environment generally take one of two general forms: automation of management tasks; and minimization of the number of mounts through storage asset virtualization. The automation approach seeks to provide better administrative tools for managing network file storage. The virtualization approach takes two forms: abstraction; and delegation. The abstraction approach aggregates low-level storage resources across many servers so that they appear to be a single resource from a single server from a client's perspective. The delegation approach designates a single server as “owning” the filesystem namespace, but upon access by a client the delegation server instructs the client to contact the origin server for the resource in question to carry out the request. None of these approaches alone fully addresses the architectural deficiencies that cause complexity growth.
“Directory services” can be used to centralize the definition and administration of both lists of server exports and lists of mounts between clients and servers. Automation schemes can then allow clients to automatically lookup the appropriate server for a given filesystem in a directory service and mount the filesystem in its own namespace on demand.
Filesystem virtualization solutions to date have usually taken one of three forms: low-level gateways between networked block-level protocols and file-level protocols; delegation systems; and fully distributed filesystems. Low level gateways aggregate storage resources which are made available over the network in block (not file) form, and provide a filesystem atop the conjunction of block storage devices thus accessed. This provides some benefit in minimizing the number of exports and servers involved from a client perspective, but creates new complexity in that a new set of protocols (block-level storage protocols) is introduced and must be managed.
Delegation systems centralize namespace management in a single system—i.e., they make it appear that all the files are located on a single server—while actually redirecting each client request to a particular origin server. Delegation systems are relatively new and support for them must be enabled in new versions of the various filesystem protocols. Delegation systems allow a directory service to appear as a filesystem. One example is MicroSoft Corp.'s NT-DFS. Delegation systems typically do not map individual directories to individual directories. In other words, all the directories below a certain point in the filesystem namespace controlled by the delegation system are mapped to a single top-level directory. Another shortcoming is that prior art delegation systems typically respond to a request for a file or directory with the same response, regardless of the client making the request. As another deficiency, the underlying directory service does not handle requests directly, but redirects the requests to be handled by underlying systems.
Fully distributed filesystems employ distributed algorithms, caching, and so forth to provide a unified and consistent view of a filesystem across all participating machines. While addressing mount management to some extent, distributed filesystems introduce new and significant challenges in terms of maintaining consistency, increased sensitivity to failures, and increased implementation complexity. It should be noted that fully distributed filesystems typically require specialized protocols and software on every participant in the system, in effect making every computer involved both a client and a server. Other distributed filesystems seek to support mobile clients which frequently disconnect from the network, and thus focus on techniques for caching files and operations and ensuring consistency of the distributed filesystem upon reconnection.
Some prior art has focused on mechanisms for taking multiple filesystems and producing a merged logical view of those filesystems on a given filesystem client. This is sometimes referred to as “stack mounting.” Stack mounting to date has been seen as a nondistributed mechanism. It is used by a client to organize and structure their own local filesystem namespace for various purposes, rather than being used to organize and manage a collection of network filesystems on an enterprise basis. Existing stacking filesystems are limited in an important way—among a collection of logically joined filesystems, a single origin filesystem is designated as the primary or “top” filesystem “layer” in the stack. All writes are performed on this filesystem layer. This has incorrectly been perceived as the only way to preserve the “correct” or traditional semantics of filesystems.
In addition to organizing and maintaining the relationships between filesystem clients and file servers, additional challenges exist in managing access to and utilization of filesystems. While most organizations have and enforce stringent document workflow and retention policies for their paper files, similar policies—while desired and mandated—are rarely enforced for electronic files. As a non-limiting example, many corporations have a policy that prohibits the usage of corporate storage capacity on fileservers for the storage of certain personal files and content types—for instance MP3s, personal digital images, and so on. This “policy” usually takes the form of a memo, email, etc. The administrators in charge of enforcing this policy face significant challenges. Conventional filesystems do not provide mechanisms for configuring a filesystem to only allow particular content types or otherwise automatically make decisions about what should be stored, where, and how. These conventional filesystems are static, and the set of semantics for access and other administrative controls are rather limited. Thus any such policy enforcement that happens is done retroactively and in an ad-hoc manner via manual or mostly-manual processes. The net result is that network file storage fills up with old, duplicated, and garbage files that often violate corporate and administrative utilization policies.
Filesystems are quasi-hierarchical collections of directories and files. The “intelligence” that a filesystem exhibits with respect to access control is typically restricted to a static set of rules defining file owners, permissions, and access control lists. To the extent even this relatively low level of “intelligence” exists, it is typically statically defined as a part of the filesystem implementation and may not be extended. Current file systems do not allow arbitrary triggers and associated activities to programmed outside of the permissions hard coded in the original implementation of the filesystem.
Additional challenges exist for filesystem monitoring and reporting. Filesystem activity produces changes to the state of a filesystem. This activity can affect changes to the structure, the stored metadata, and the stored data of the directories and files. Generally speaking, this activity is not logged in any way. Rather, the filesystem itself holds its current state. Some filesystems—called “journaling” filesystems—maintain transient logs of changes for a short duration as a means of implementing the filesystem itself. These logs, however, are not typically organized in any way conducive to monitoring and reporting on the state of the filesystem and its evolutionary activity over time. These logs are typically not made available to external programs, but are instead internal artifacts of the filesystem implementation. Further, these logs are frequently purged and therefore provide a poor basis for reporting of historical and trend data.
A significant problem is that of collection, redaction, and analysis of high-level data about what a filesystem is being used for, what is stored in it, by whom and for what purpose. Solutions today involve software programs or users explicitly browsing through the filesystem structure, gathering the data required, and then analyzing it, acting on it or taking some other action based on the data. Collection of filesystem data proactively as operations occur is generally not done as it is generally not supported by the filesystem itself. Furthermore the accuracy of such collected data is questionable, as it reflects not an instantaneous state of the filesystem at any given moment but rather an approximate state of the filesystem over the duration of the run. Without collecting and maintaining the appropriate statistics as file operations occur, the data at the end of the run can not typically represent a correct and accurate picture of the contents of the filesystem at that time.
The problem of data collection and reporting is further compounded in the network filesystem environment. Because each server—indeed, each filesystem on each server—is a separate entity, it is therefore necessary to perform each data collection independently on each server. If reporting or monitoring is to be done across the network filesystem environment, significant challenges exist. Namely, because of the parallel and discrete nature of the collection runs, it becomes difficult to sensibly merge the collected data into a consistent snapshot of the state of the filesystem at some time.