Often a portion of data being backed up by a user or other entity comprises repetitive data. Consider an example where an electronic message (“e-mail”) is sent to 100 recipients, it may be stored 100 times in a data storage system, constituting some amount of repetition. In another example, multiple copies of slightly different versions of a word processing document are stored in a data storage system. A large portion of each of the documents is likely to constitute repetition of data stored in conjunction with one or more of the other instances of the word processing document.
Data de-duplication is sometimes used to reduce the amount of repetitive data stored in a data storage system. Presently most data de-duplication is performed in software which executes on a processor within or coupled with a data storage system. For example, such de-duplication can be performed by a processor within or coupled with a data storage system to which an entity (e.g., a user, business, network) sends a backup stream in conjunction with a data backup. Backup software from an independent software vendor (ISV) is typically used to generate such a backup stream from the entity's stored data. Some examples of storage systems which can utilize data de-duplication include, but are not limited to: a storage appliance, a backup appliance, a network attached storage, a virtual tape library, and a disk array.
Data de-duplication often involves identifying duplicate data segments in a stream of data, such as a backup stream, then replacing an identified duplicate data segment with a smaller reference such as a pointer, code, dictionary count, or the like, which references a data segment, pointer, or the like stored in a de-duplication library. In the case of a backup stream, the de-duplicated backup stream is then stored. Because the de-duplicated backup stream is smaller in data size than it was prior to de-duplication, such de-duplication allows more data to be stored in a fixed size data storage system than would otherwise be possible. Because less storage space is required, in some environments this allows backed up data to be retained for a longer time before deletion. Additionally, it is appreciated that the de-duplication process can be reversed to reassemble the backup stream if access to the backed up data is desired.
The drawings referred to in this Brief Description of Drawings should be understood as not being drawn to scale unless specifically noted.