The disclosure generally relates to the field of data management, and more particularly to data management for resource consumption efficiency.
A distributed storage system can be structured with front end storage and back end storage. The front end storage includes devices (e.g., servers, filers, etc.) and applications on those devices that are “client facing.” The front end storage elements (i.e., devices and applications) are characterized as client facing because they are exposed to clients to receive requests and provide responses. An entity considered a client is also often referred to as a host since it hosts an application(s). The application performs operations, some of which involve reading data from storage and writing data to storage. The front end storage elements interact with back end storage elements to carry out these reads and writes. The back end storage elements can include storage arrays and the corresponding controllers. In some cases, the front end storage elements and at least some back end storage elements are within a same housing.
Managing data in distributed storage systems includes performing operations for storage efficiency. Deduplication or block sharing is a technique for storage efficiency with respect to storage space consumption. For deduplication, data of different write operations are compared to determine whether the data is the same. This often involves generation of fingerprints based on the data and comparison of the fingerprints. A fingerprint match is eventually validated with the actual data to avoid a fingerprint collision, despite the low likelihood of a fingerprint collision. After match validation, metadata of the data units to be written will refer to a same storage location.