The process of backing up data for most information technology systems comprises copying and archiving of computer data, such that it may be used to restore the original after a data loss event, including data deletion or corruption. In a computer cluster environment, having a server and multiple client nodes, data recovery may not always yield a high performance solution. In particular, a clustered environment enables a large number of parallel backups to exist due to enhanced bandwidth of the system. However, due to the random nature of storage within a clustered environment, only a small number of data recovery streams can occur simultaneously. In particular, when a first backup is performed, the server writes the backup data sequentially on one or more disks in storage. A single client that restores data from this backup image may achieve high throughput due to the sequential read performance associated with one or more disks. As incremental backups are taken from the same machine and data is deduplicated on the backup server, however, the backup image gets fragmented over a large number of sectors of the one or more disks. The restore performance from such a fragmented backup image suffers in performance, due to the random Input/Output (I/O) read performance problems exhibited by disks in comparison to sequential I/O read performance. For example, while an exemplary clustered environment may be able to perform 512 backups in parallel, this same system is only able to perform four data recovery (restoration) attempts in parallel due to the random I/O read performance.
Data recovery becomes even more complex when backups from multiple (differing) clients are performed simultaneously to the same backup server. When multiple clients send data at the same time, it leads to more segmentation of the stored data. In particular, the server may receive data from one client and write it to a disk. When the server receives data from another client, this data may be written to the same disk. Thereby, the data from one or more of the clients gets “scattered” on the disks. Further, when multiple clients request data, the backup server trashes (i.e. the disk head becomes overwhelmed with moving around in search of the most recent incremental backup of the data without reading or writing data). Therein, the data recovery throughput goes from bad to worse when multiple restores initiated by differing clients occur simultaneously. It is within this context that the embodiments arise.