A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information. A directory, on the other hand, may be implemented as a specially formatted file in which information about other files and directories are stored.
A filer may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the filer over a direct connection or computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the file system on the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. By “file system” it is meant generally a structuring of data and metadata on a storage device, such as disks, which permits reading/writing of data on those disks. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers in the inode, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ software, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a computer that manages data access and may, in the case of a filer, implement file system semantics, such as the Data ONTAP™ storage operating system, implemented as a microkernel, and available from Network Appliance, Inc. of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL™) file system. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof; volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
Data storage is an increasingly crucial and central part of many industries dealing in financial transactions and other sensitive tasks, such as banks, government facilities/contractors, defense, health care institutions, pharmaceutical companies and securities brokerages. In many of these environments, it is necessary to store selected data in an immutable and unalterable manner. This need continues to grow in the light of current concerns over institutional fraud and mismanagement, wherein the temptation on the part of wrongdoers to erase or alter incriminating data is always present. Forms of data that require immutable treatment often include e-mails, financial documents and transaction records, and any other record that may act as proof of an important action or decision. Even in less-critical/unregulated environments, the ability to store a secure unalterable data cache is highly desirable. For example engineering, medical, law and other professional firms may wish to establish a cache of key data (e.g. invention reports or design files, client communications, medical images, etc.), that will remain unaltered and online for long periods on time. These caches can provide reliable references and proofs for clients and other interested parties.
For an example of a highly regulated environment, the United States Securities and Exchange Commission (SEC)—the body that regulates all securities transactions and reporting relative to public corporations—promulgates SEC Rule 17a-4 governing document retention for brokers and investment institutions. This rule requires that these entities store e-mails and other documents in connection with a variety of transactions and trades by clients of the entities unchanged and unchangeable for a number of years and to be able to provide these records to the SEC and other regulators on short notice. Failure to comply with these rules can lead to significant sanctions.
A variety of prior art approaches involving tape drives, electro-optical recordable media and the like have been employed over the years to implement a WORM storage system. Each of these systems has certain drawbacks in terms of storage size, speed maintenance requirements or a combination of these (and other) factors.
In the above-incorporated-by-reference U.S. patent application Ser. No. 10/391,245, issued as U.S. Pat. No. 7,155,460 on Dec. 26, 2006, entitled WRITE-ONCE-READ-MANY STORAGE SYSTEM AND METHOD FOR IMPLEMENTING THE SAME, by William P. McGovern, et al., a particularly advantageous approach to WORM storage is taught, which employs conventional fault-tolerant (e.g. RAID-based) disk storage (or similar rewritable media) as a platform for a WORM storage system. This described system is advantageous in that such disks are large in storage capacity, relatively inexpensive and easily added to an existing storage implementation. However, these disks are also inherently rewritable and/or erasable, in light of existing operating systems and protocols that are typically designed with semantics that specifically enable the free rewriting and erasure of attached disks. The described WORM storage approach is, therefore, specially configured to absolutely prevent alteration of any WORM-designated data. Also, to maintain longevity of the solution and make it available to as many clients as possible, the described WORM implementation utilizes open protocols such as CIFS and NFS and requires minimal alteration to these protocols or the applications that employ them and a minimal footprint on client applications. The system is, thus, organized around WORM storage volumes that contain files, which when committed to WORM storage, cannot be deleted or modified. Any file path or directory tree structure used to identify the file within the WORM volume is locked and cannot be deleted.
In the described WORM system, an administrator creates a WORM volume (or other WORM-designated data organizational structure), capable of storing designated WORM files (or other “data sets”). The client then creates an appropriate WORM file using the appropriate protocol semantics. The file is written to the volume and committed to WORM state by transitioning the file attributes from a not-read-only state to a read-only state. The file system persistently stores the WORM state of a file with the attributes and metadata for the file and uses this persistent WORM state to recognize WORM files on a WORM volume. Henceforth, any attempt to modify the file attributes, write to the file, or delete the file, by clients, administrators or other entities is rejected and a request denied message is returned to the attempting party. Since the file cannot be deleted, conventional file system semantics prevent deletion of the directory path. Likewise, the file system does not permit renaming of directories in an illustrative embodiment to thereby ensure the reliable and immutable identification of WORM files within the directory structure.
Committing of the WORM file to the WORM storage volume can be performed by the client via a command line interface in an interactive manner. Alternatively, applications, which are familiar with the WORM semantics, can be adapted to commit the file using an appropriate application program interface or other programmatic command structure. Similarly, open protocols, such as NFS or CIFS, through which the clients communicate with the file server/file system can be modified to enable automatic commit of created files upon a key event, such as closing of the file. The protocols and file system can be adapted to enable specialized WORM directories within the volume. An appropriate WORM file extension can be provided so that worm files within the volume can be readily identified by the client. Also, selected mirroring and backup functions may be allowed, while other backup functions that enable restoration or reversion of the volume to an earlier point in time may be disabled.
Many regulatory schemes governing WORM data storage (for example SEC 240.17a-4) specify provisions for retention periods, after which the WORM data can be discarded. In the absence of a specified retention period, applied to the record on creation, the regulations generally specify permanent retention. In the case of removable media, such as tapes or electro-optical storage, the media are carefully indexed and stored (often in secure sites) during their retention periods. Upon expiration of an applicable retention date, the expired media is retrieved from storage and physically destroyed. Since disk storage has the inherent ability to be rewritten and reused when a particular record is no longer needed, it is contemplated that the WORM protection on various on-disk records may carry a retention date, and when the retention date passes, the expired WORM record and associated data may be erased, thus preserving storage resources and ensuring the orderly and predictable removal of expired WORM data—without the material waste evoked by physical media destruction.
One commercially available WORM storage system marketed under the tradename Centera from EMC Corp. of Hopkinton, Mass. enables basic forms of retention dates for record storage. The system utilizes a network-connected cluster of general-purpose computer systems running a customized variant of the Linux operating system. A proprietary application programming interface (API) and proprietary protocols for interfacing with the storage system, as opposed to the open protocol and open standardized API approach is implemented by these computers. As such, applications can only access the storage and manipulate records through proprietary mechanisms or through a “gateway” interposed between the users and the storage system, which translates an open protocol to the proprietary protocols supported by the storage system.
This form of WORM storage system utilizes, so-called “Content Addressable Storage,” (CAS) for management of stored records. CAS relies on computing digital signatures, using an algorithm such as an MD5 hash, of the contents of any WORM-stored records to create a unique key (of “content address”) for each and every record. A representation of the digital signature of a record is used as the “key,” or “content address,” with which any future reference to the stored object must be made. This is often described as similar to a “claim check” system whereby the storage system generates a unique key for every object stored, which it returns to the application. The application is responsible for management and preservation of these content addresses, which must be performed external to the storage system.
To associate retention information with a stored record, the proprietary API permits metadata, in a proprietary format, to be associated with a stored object. This metadata information can include retention information for the record. The API supports the ability to extend retention dates further into the future, and in certain configurations, to assign an infinite retention date to those records submitted without retention information. Because of the CAS architecture, every object written to the system, as long as it has unique contents, is stored as a unique object with a unique content address. To enable WORM functionality the API prevents deletion of objects prior to the expiration of their associated retention period. Modification of existing objects is impossible because any changes in the contents of an object will result in a new content address, and hence a new object being created in the storage system.
To track retention time and other time-dependent functions, this system is believed to simply draw time values from the system hardware clocks within the nodes (computers) of the cluster for time reference and rely on the physical security of the system to prevent tampering.