Backup storage systems traditionally have been designed and optimized such that data is stored and restored from the backup storage systems in a sequential manner. Modern backup technologies, such as VM Instant Access/Instant Restore from EMC and changed block tracking (CBT), however do not access data sequentially. Instead, such modern backup technologies may access data randomly. To provide better performance for random input/output (IO) workloads, backup storage systems have largely been retrofitted and tuned for random IO, e.g., read/write, processing.
Furthermore, a file system with support for improving random IO workloads in the backup storage systems would issue additional number of IO requests. On the other hand, sequential IO workloads are processed serially in order to achieve good locality. As such, random IO workloads may generate more IO loads on a particular backup storage system as opposed to the sequential TO workloads. This may lead to imbalance of performance for different types of TO workloads and may result in a client timing out.
With respect to a data deduplication system, a data stream may pass through different stages of execution (e.g., data segmentation, fingerprint calculation, fingerprint verification, compression/decompression, etc.). Complicating matters, some data segments (e.g., duplicate data segments) may not pass through the same stages of execution as other data segments (e.g., non-duplicate data segments), thereby generating different loads on the data deduplication system. This creates complications in understanding the loads on the system and predicting a timeout of a client.