1. The Field of the Invention
The present invention relates to software, hardware, systems and methods for de-duplicating redundant data. More particularly, embodiments of the invention relate to software, hardware, systems, and methods for globally de-duplicating data across a plurality of storage systems implementing traditional copy-on-write snapshot technology or the WAFL file system.
2. The Relevant Technology
Economic, political, and social power are increasingly managed by data. Transactions and wealth are represented by data. Political power is analyzed and modified based on data. Human interactions and relationships are defined by data exchanges. Hence, the efficient distribution, storage, and management of data is expected to play an increasingly vital role in human society.
The quantity of data that must be managed, in the form of computer programs, databases, files, and the like, increases exponentially. As computer processing power increases, operating system and application software becomes larger. Moreover, the desire to access larger data sets such as multimedia files and large databases further increases the quantity of data that is managed. Additionally, this increasingly large data load often requires one or more data protection services, which may include generating backups and performing other operations or services for the data, further increasing the quantity of data being managed.
Snapshots are often implemented in storage systems such as storage arrays and file servers to create static versions of active or original data that can be used for backup and other operations while the active data itself remains available without interruption. Advantageously, some snapshots reduce the quantity of data that must be managed by sharing unchanged original or active data, rather than creating a complete copy of the data. For instance, a copy-on-write snapshot initially copies only the metadata of the active data that points to where the active data is stored into snapshot storage. Before a write is allowed to a block of the active data, the block is copied to the snapshot storage. Read requests to the snapshot of unchanged blocks are redirected to the active data while read requests to blocks that have been changed are directed to the “copied” blocks in the snapshot.
Another snapshot technology that minimizes data can be implemented in a WAFL file system. WAFL file systems utilize an inode tree structure to organize data, with a root inode at the root of the tree. A WAFL snapshot can be created by copying the root inode to a snapshot inode that initially points to the exact same data as the root inode. When a block of the original data is changed, the WAFL file system writes the change to a new storage location without overwriting the old block of data. One or more inodes beneath and/or including the root inode can be modified to point to the changed block. Other than changed blocks, however, both the root inode and the snapshot inode point to the same blocks of data that are shared between each.
In addition to minimizing storage requirements in a storage system by sharing unchanged data between a root inode and snapshot inodes, the WAFL file system has further been extended to identify and eliminate redundant data blocks beneath the root inode within a storage system.
Notwithstanding the data reduction obtained by implementing copy-on-write and WAFL technologies, these solutions fail to reduce redundant data stored in snapshots (and in active data in the case of conventional copy-on-write) and can only be applied to individual storage systems and not globally across storage systems.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced