The present invention relates to data storage systems, and more specifically, this invention relates to aggregating read requests requesting common data objects into a common read operation in a data storage system for improving throughput and thereby reducing overall latency in the data storage system.
It is common practice within enterprise backup space to utilize data deduplication technologies to reduce the quantity of data stored as a part of a backup or storage solution. This is often done at either a software or a hardware layer of a data storage system, to reduce storage costs for collections of data or data streams which possess commonly-shared data extents. For example, a fingerprinting algorithm analyzing a data stream from a backup client or source function typically splits the data into extents or “chunks” of a particular range of size. A database may be referenced to determine if data in an extent is already stored within the storage repository. If so, the data does not need to be stored again, but rather, the existing extent can be used to provide a copy of the data upon request for the data. Typically, a count of references is kept track of and object inventory tables are used to enable the backup or storage solution to reconstitute the front-end backup or storage object upon request by restoring the constituent deduplicated extents. It is common for data sources in production environments to feature data sources which deduplicate at 50% or greater, e.g., feature fingerprinted extents which are 50% common across objects or the overall stream. This technology is used and valued by users of storage technologies.