The present invention relates generally to computer data processing systems, and more particularly to data duplication using a small hash table.
Data deduplication refers to the reduction and/or elimination of redundant data. The goal of a data deduplication system is often to store a single copy of duplicated data. In data deduplication, a data object, which may be a file, a data stream, or some other form of data, is broken down into one or more parts of a specific length called chunks or blocks, and the data chunks are grouped together with other data chunks containing matching content. In a typical data deduplication process, duplicate copies of data are reduced or eliminated, leaving a minimal amount of redundant copies, or a single copy of the data, respectively.