Data backup is a procedure of preserving data in a certain form, such that when system is damaged or under other specific conditions, the data can be reused. Data backup is an important component in the field of storage, and its position and function in storage system can not be ignored. Further, for an IT system, backup task is also an indispensable component therein. This is because not only can it prevent damage due to accidental event, but also enable historical data to be saved and archived in an optimal way, that is, it provides possibility for conducting historical data query, statistics collection and analysis on historical data, and archiving and saving important information.
De-duplication technology can eliminate redundant data by deleting duplicated data in a data set and only preserving one piece thereof. Generally, since there is a large amount of duplicated data in original data, optimized data for storage can be obtained by using de-duplication technology. Storage space needed by optimized data for storage is significantly reduced. Currently, de-duplication technology is widely used in data backup and archive system, it can help an application program to reduce amount of data for storage, save network bandwidth, improve storage efficiency, thereby saving cost.
There are mainly two criteria for measuring de-duplication technology, that is de-duplication ratio and performance of executing de-duplication. The de-duplication ratio is determined by data's own feature and application schema, while performance of de-duplication depends on specific implementation technology. Current manufacturers have provided many de-duplication methods such as fixed length chunking method, non-fixed length chunking method etc, and in order to increase de-duplication ratio or performance of executing de-duplication, various manufacturers are continuously developing new de-duplication method and system.