1. Field of the Invention
The present invention relates to general computing systems, and more specifically, to a method, system, and computer program product for improved data deduplication using phrase substitution to enhance efficiency in computing storage environments.
2. Description of the Related Art
Computer systems frequently include data or disk storage systems to process and store data. A data or disk processing system requires a large amount of data storage. Data generated by a user within the data or disk processing system occupies a large portion of the available data storage space. Disk drives can exist as a solo entity, or as part of a broader makeup within a larger storage environment. Regardless of the size of the storage environment, often duplicate data is written.
Duplicated content takes up a large amount of storage space. This replicated data can be de-duplicated using standard deduplication techniques. Data deduplication refers to the reduction and/or elimination of redundant data. In a common data deduplication process, duplicate copies of data are reduced or eliminated, leaving a minimal amount of redundant copies, or a single copy of the data, respectively. Identical, repetitive storage of data that used to be written multiple times within a storage system, only needs to be written once and referred to by pointers.