The present invention relates to management of file systems and large files.
This application incorporates by reference herein as follows:    U.S. application Ser. No. 10/264,603, Systems and Methods of Multiple Access Paths to Single Ported Storage Devices, filed on Oct. 3, 2002, now abandoned;    U.S. application Ser. No. 10/354,797, Methods and Systems of Host Caching, filed on Jan. 29, 2003, now U.S. Pat. No. 6,965,979 B2;    U.S. application Ser. No. 10/397,610, Methods and Systems for Management of System Metadata, filed on Mar. 26, 2003, now U.S. Pat. No. 7,216,253 B2;    U.S. application Ser. No. 10/440,347, Methods and Systems of Cache Memory Management and Snapshot Operations, filed on May 16, 2003, now U.S. Pat. No. 7,124,243 B2;    U.S. application Ser. No. 10/600,417, Systems and Methods of Data Migration in Snapshot Operations, filed on Jun. 19, 2003, now U.S. Pat. No. 7,136,974 B2;    U.S. application Ser. No. 10/616,128, Snapshots of File Systems in Data Storage Systems, filed on Jul. 8, 2003, now U.S. Pat. No. 6,959,313 B2;    U.S. application Ser. No. 10/677,560, Systems and Methods of Multiple Access Paths to Single Ported Storage Devices, filed on Oct. 1, 2003, now abandoned;    U.S. application Ser. No. 10/696,327, Data Replication in Data Storage Systems, filed on Oct. 28, 2003, now U.S. Pat. No. 7,143,122 B2;    U.S. application Ser. No. 10/837,322, Guided Configuration of Data Storage Systems, filed on Apr. 30, 2004, now U.S. Pat. No. 7,216,192 B2;    U.S. application Ser. No. 10/975,290, Staggered Writing for Data Storage Systems, filed on Oct. 27, 2004;    U.S. application Ser. No. 10/976,430, Management of I/O Operations in Data Storage Systems, filed on Oct. 29, 2004, now U.S. Pat. No. 7,222,223 B2; and    U.S. application Ser. No. 11/122,495, Quality of Service for Data Storage Volumes, filed on May 4, 2005.
Data storage systems today must handle larger and more numerous files for longer periods of time than in the past. Thus, more than in the past active data is a shrinking part of the entire data set of a file system leading to inefficient use of expensive high performance storage. This impacts data storage backups and lifecycle management/compliance.
As background, a file is a unit of information stored and retrieved from storage devices (e.g., magnetic disks). A file has a name, data, and attributes (e.g., the last time it was modified, its size, etc.). A file system is that part of the operating system that handles files. To keep track of the files, the file system has directories. The directory contains directory entries which in turn consist of file names, file attributes, and addresses of the data blocks. Unix operating systems split this information into two separate structures: an i-node containing the file attributes and addresses of the data blocks and directory entries containing file names and where to find the i-nodes. If the file system uses i-nodes, the directory entry contains just a file name and an i-node number. An i-node is a data structure associated with exactly one file and lists that file's attributes and addresses of the data blocks. File systems are often organized in a tree of directories and each file may be specified by giving the path from the root directory to the file name.
To address inefficient use of expensive high performance data storage, third party archiving and hierarchical storage management (HSM) software migrate data from expensive high performance storage devices (e.g., Fibre channel) to lower cost storage devices such as tape or Serial ATA storage devices.
Archival and HSM software must manage separate storage volumes and file systems. Archival software not only physically moves old data but removes the file from the original file namespace. Although symbolic links can simulate the original namespace, this approach requires the target storage be provisioned as another file system thus increasing the IT administrator workload.
Archival and HSM software also don't integrate well with snapshots. The older the data, the more likely it is to be part of multiple snapshots. Archival software that moves old data does not free snapshot space on high performance storage. HSM software works at the virtual file system and i-node level, and is unaware of the block layout of the underlying file system or the block sharing among snapshots when it truncates the file in the original file system. With the two data stores approach, the user quota is typically enforced on only one data store, that is, the primary data store. Also, usually each data store has its own snapshots and these snapshots are not coordinated.
Archival software also does not control initial file placement and is inefficient for a large class of data that ultimately ends up being archived. Since archival software is not privy to initial placement decisions, it will not provide different quality of service (QoS) in a file system to multiple users and data types.
Archiving software also ends up consuming production bandwidth to migrate the data. To minimize interference with production, archiving software typically is scheduled during non-production hours. They are not optimized to leverage idle bandwidth of a storage system.
NAS applications may create large files with small active data sets. Some examples include large databases and digital video post-production storage. The large file uses high performance storage even if only a small part of the data is active.
Archiving software has integration issues, high administrative overhead and may even require application redesign. It may also require reconsideration of system issues like high availability, interoperability, and upgrade processes. It would be desirable to eliminate cost, administrative overhead, and provide different QoS in an integrated manner.