There are few online backup systems offering global deduplication on a file/block level. Traditionally, such backup systems either use a sharing algorithm that employs common encryption keys or the backup systems do not encrypt common files/blocks. One of the implications for the use of such systems is that the sharing of common files/blocks may be a potential security risk, or that such systems and methods nullify the benefits of global deduplication where some data blocks must be stored multiple times.
Existing online backup systems often consist of one or more multiple storage systems with numerous backup applications sending and/or retrieving data to/from the storage systems. Traditionally, such backup systems store data in a file system, which often have limited size, and are not able to scale billions of files.
There are conventional systems that store data on clustered file systems. These types of systems require very reliable high-performance storage systems that make the implementation very expensive. Moreover, such systems need many data storage nodes which, on a global scale, often cause a bottleneck. The requirement for storing large number of files increases when deduplication is needed on a block level. For example, if hundreds of millions of files were scheduled to be backed up, and each file contained an average of one thousand blocks of 4 kB, the storage system would need to be capable of handling hundreds of billions of files. Such large numbers of files are very difficult to catalog and handle. There are conventional online backup system implementations that store 500 million files. If, for example, each file had multiple versions, the number of protected files would grow dramatically. Such conventional systems are thus often impractical and inefficient. In other conventional systems, data is stored on virtual tape libraries. Such virtual tape libraries have limited capacity even if this capacity is in the range of gigabytes, terabytes or larger. If there were a need to store more data, more virtual tape libraries would need to be added to the system. In such situations however, the advantages of block level deduplication are lost with the addition of each new virtual tape library.
There are conventional systems that offer global block level deduplication. Such implementations have an indexing database that stores the signatures of the blocks. In global schemes such indexing databases become a bottleneck.
What is needed is an online backup system capable of storing unlimited numbers of files with unlimited size. Such online backup systems must not suffer from degradation in performance due to the very large number of files being protected. This new online backup system must be able to identify existing blocks and store them only once, thus offering global block level deduplication. This new online backup system must have built-in replication in order to offer redundancy and increased performance.