The invention relates to a device and to a method for storing data in a distributed file system having a plurality of deduplication storage devices.
Data deduplication, i.e., the reduction and elimination of redundant data within a storage or memory device, is a data reduction technique already used in many contemporary enterprise storage stacks. For example, US 2010/0161554 A1, U.S. Pat. No. 7,747,584 B1 and EP 2256934 A1 disclose deduplication capable systems.
On one side, deduplication may lead to significant cost reductions directly resulting in a competitive advantage for customers as it enlarges the effective storage capacity, while on the other side its integration into flash-based storage provides manufacturers the possibility to reduce write amplification, thereby substantially extending flash endurance. The latter explains the recent growth in data deduplication-capable storage solutions being closely coupled with the recent growth of flash systems. While an I/O indirection property required by deduplication is a natural property of flash storage controllers, the drastically improved access times compared to traditional spinning disks may require fast index lookups for efficient inline deduplication.
Deduplication is typically performed at the file system level or block layer, but only the latter achieves high bandwidth when executed inline, i.e., within the storage devices. Also, network and clustered file systems (e.g., NFS, HDFS, Google FS, GPFS, etc.) are mostly agnostic to the underlying storage devices being deduplication-capable. In a likely near-future scenario, where most block devices participating in a network file system may offer data deduplication, their deduplication services could be underutilized because the same deduplicatable data might be spread over many deduplication-capable devices. Moreover, the perceived per-device capacity utilization at the file system-level might differ significantly from the actual one with possible implications in load balancing efforts.
Accordingly, it is an aspect of the present invention to improve the deduplication when storing data in storage devices which are deduplication-capable.