1. The Field of the Invention
The present invention relates to data de-duplication. More particularly, embodiments of the invention relate to software, hardware, systems, and methods for de-duplicating redundant data in pooled storage capacity of a virtualized storage environment.
2. The Relevant Technology
Virtualization is an abstraction layer that decouples physical computing resources in a computer environment from systems, applications, and/or end users that interact with those resources to deliver greater IT resource utilization and flexibility. For instance, server virtualization allows multiple virtual machines, with heterogeneous operating systems, to run in isolation, side-by-side on the same physical machine. Each virtual machine has its own set of virtual hardware (e.g., RAM, CPU, NIC, etc.) upon which an operating system (“OS”) and applications are loaded. The OS sees a consistent, normalized set of hardware regardless of the actual physical hardware components.
Similarly, storage virtualization is the amalgamation of multiple storage devices into what appears to be a single storage unit. Storage virtualization presents a simple object (such as a volume) upward in a stack to, e.g., a host system, hiding the physical complexity of underlying networks, storage, and other constructs. Storage virtualization can provide many benefits, including centralizing storage management, easier replication, non-disruptive data migration when subsystems fail or are replaced, and implementation of cost-effective tiered storage, to name a few.
Notwithstanding its many advantages, storage virtualization can result in the unnecessary storage of significant amounts of redundant data in the pooled storage capacity. For instance, in a computer environment including pooled storage capacity and a plurality of host systems (e.g., servers), each with its own OS, a plurality of OSes are stored in the pooled storage capacity. In some instances, however, two or more of the OSes may be identical and/or may include identical data, files, executables, or the like. In this case, each host stores its OS and associated data, files, executables, and the like in a portion of the pooled storage allocated to it. As another example, consider an email server servicing an electronic message with a large attachment sent to a plurality of intra-system users. For every user the attachment is sent to, the email server stores a separate copy of the attachment in a portion of the pooled capacity allocated to the email server. In each of the two cases just described, redundant instances of data occupy space in the pooled storage capacity which could be used for other data. Consequently, there currently exists a need in the art for data de-duplication solutions in virtualized storage environments.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.