1. Field of the Invention
This invention relates to computer systems and, more particularly, to storage systems.
2. Description of the Related Art
Computer systems often process large quantities of information, including application data and executable code configured to process such data. In numerous embodiments, computer systems provide various types of mass storage devices configured to store data, such as magnetic and optical disk drives, tape drives, etc. To provide a regular and systematic interface through which to access their stored data, such storage devices are frequently organized into hierarchies of files by software such as an operating system. Often a file defines a minimum level of data granularity that a user can manipulate within a storage device, although various applications and operating system processes may operate on data within a file at a lower level of granularity than the entire file.
In many conventional file-based computer systems, files may be created, destroyed and manipulated with relatively few constraints. Typically, files may be arbitrarily named, subject to operating system conventions, and often, unlimited numbers of exact copies of existing files may be made with ease, subject only to available storage capacity. While such ease of data proliferation may simplify system operation for the user, it may also result in inefficient use of storage devices and difficulties in data management. For example, storage devoted to multiple identical copies of a given file may be redundant and therefore wasted, but it may not be apparent that the copies are in fact identical. Similarly, two given files may be mostly identical in content without being apparently related on the basis of, e.g., file name. In some cases, files may be similar in information content but encoded in different formats, rendering a simple bitwise comparison of files uninformative. Generally speaking, although files may be created and their content modified arbitrarily, useful content relationships among various files may exist, even though such relationships may not be obvious from conventional file characteristics such as file names.