1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to data lifecycle management in data storage systems.
2. Description of the Related Art
One requirement of data lifecycle management in file systems is to ensure that all copies of a lifecycle-managed object are deleted in a timely manner. While placing managed files into a repository (e.g., EMC Centera, KVS vault, etc) may ensure that the files in the repository can be deleted as desired, it may be difficult to manage files and copies of files that are in a network of file systems across arbitrary machines, and their archives and backup images. For example, if a copy of a lifecycle-managed document is made, moved to a local directory, and then renamed, how can a compliance officer be sure the copy is destroyed in conjunction with the original? In some environments, it may be just as important that all pertinent copies be destroyed in a timely manner as it is that the documents be maintained for the specified period.
Regulatory compliance is a major concern of for-profit entities such as corporations, and also for non-profit entities. For example, the financial industry, public companies, and even non-profit organizations are subjected to increasing amounts of scrutiny about how they perform financial transactions. As part of this scrutiny, various entities may be required to conform to regulations regarding data retention, both for paper data and electronic data. For example, entities may be required to keep certain data, and perhaps even all copies of the data, for a specified period, for example five years, but perhaps even for decades in the case of some data. Note that these regulations may be changed by the regulatory organization to either require shorter or longer retention periods for data. Note that even entities not under such regulatory control may establish rules and guidelines for retaining at least some of their data for a specified period. Generally, entities establish rules and guidelines to delete much if not all of their documents, both electronic and paper, after the retention period, whether internally or externally imposed, for the documents has expired.
A problem facing those responsible for compliance with internal or external data retention regulations is with the ability to delete all copies of documents that are no longer required to be kept. For example, the electronic documents of an entity may need to be retained for a period for legal reasons, but the organization may wish to delete them after that time has elapsed. Conventionally, it is difficult if not impossible to completely delete these electronic documents and all copies of the documents. For one thing, an original electronic document that has been retained and is now expired may have been copied, renamed, modified, backed up to multiple backup images, copied to other systems including laptops and employee's home computers, and so on, thus making all the copies difficult if not impossible to locate. Even if an IT department were able to identify where all the copies of the document are located, it is unlikely that the IT department would be able to delete all the copies, because the copies may be located on many people's personal discs, in backup images that are in a vault offsite, and so on.
File Systems
A file system may be defined as a collection of files and file system metadata (e.g., directories and inodes) that, when set into a logical hierarchy, make up an organized, structured set of information. File systems organize and manage information stored in a computer system. File systems may support the organization of user data by providing and tracking organizational structures such as files, folders, and directories. A file system may interpret and access information stored in a variety of storage media, abstracting complexities associated with the tasks of locating, retrieving, and writing data to the storage media. File systems may be mounted from a local system or remote system. File system software may include the system or application-level software that may be used to create, manage, and access file systems.
File system metadata may be defined as information that file system software maintains on files stored in the file system. File system metadata may include, but is not limited to, definitions and descriptions of the data it references. File system metadata may include one or more of, but is not limited to, inodes, directories, mapping information in the form of indirect blocks, superblocks, extended attributes of files or the equivalent thereof, etc. In some cases, file system metadata for a file includes path information for the file as seen from the application side and corresponding file system location information (e.g. device:block number(s)). File system metadata may itself be stored on a logical or physical device within a file system.