Embodiments of the invention relate to block level backup and restore. In particular, embodiments relate to file system agnostic image backup on sequential devices to facilitate efficient file level retrieval.
Production servers store production data in production storage. Periodically, the production data is copied to backup storage at a backup server. The production data may be copied as an image. The terms “image”, “block level backup”, and “snapshot” are used to represent technology that refers to the process of protecting live production servers using software or hardware, without interfering with the production servers. The technology captures a point-in-time representation of the production data from a production server in a lower level of the storage stack and represents the production data in storage blocks that are agnostic to upper storage layers (e.g., volume managers, file systems, etc.).
Some production servers perform file level backups of images from production servers to a backup server. Some of the production servers may have millions of files, which cause the creation of large indices in backup processes, while other production servers include applications (e.g., database applications) that physically use large files, which causes backup processes to backup those files frequently.
In addition, virtualization vendors have introduced Application Programming Interfaces (APIs) that provide capabilities to perform off-host block level incremental backups. Such off-host processes are becoming popular, and some backup vendors have created products to interact with those APIs (e.g., to allow off-host block level backups of virtual machines) and deprecate the usage of traditional file level backup in guest production servers.
With the growing popularity of hardware snapshots, and the support of incremental capacities in some hardware storage devices, it also becomes more convenient to backup production data using hardware snapshots and off loading the production data into software based solutions in block representation, thus, creating block level images in contrast to legacy file level backups.
While storage devices are becoming less expensive, there is a price gap between different backup storage technologies, such as random access devices (e.g., disk devices) and sequential devices (e.g., tape drives). With growing storage requirements, customers are using data reduction technologies to production data (i.e., primary data) and backup data, and information is stored on sequential devices due to price, especially when the amount of time that the backup data needs to be kept on those devices, and the size of the backup data, is large.
When performing file level recovery from images, backup data is typically retrieved from the image in several steps. First there is a need to enumerate the list of files (e.g., with a Dir command). Second, a specific file or files to be retrieved are selected. Third, the data blocks of the specific file or files that need recovery are retrieved. While the data blocks of the files themselves are typically located sequentially (because of the locality of reference nature of modern file systems), the meta-data blocks to enumerate the files, in most cases, are not kept in sequential order and are located on different areas of the volume holding the files.