Historically, one approach to storing data is with the use of a physical tape library. A physical tape library represents a collection of physical tapes (e.g., physical magnetic tapes). Often a physical tape library may include a large number, for example thousands to hundreds of thousands, of such physical tapes. A robotic arm known as a picker may be used to select the physical tapes from slots and insert them into physical tape drives under electronic control. The physical tape library may be coupled with a network and used for network data storage. Backup software applications or other software (e.g., on network coupled computers) may be configured to use such physical tape libraries.
Virtual tape libraries (VTLs) provide an alternative to physical tape libraries. The VTLs appear to be physical tape libraries to the backup applications, or other software that utilize them (e.g., the virtual tape libraries may emulate physical tape libraries). The VTLs, however, typically do not actually store the data on physical magnetic tapes, but rather store the data on one or more hard disk drive arrays, solid state storage devices, or other types of physical storage. This offers an advantage that the VTLs do not need to rely on the mechanics or robotic arms used for physical tape libraries. Moreover, the backup applications, or other software utilizing the VTL, do not need to know that physical magnetic tapes are not being used for data storage, and do not need to be changed. This helps to avoid modifying existing backup applications and technology that has been designed to use physical tape libraries. As a further advantage, the data stored by a backup application to a virtual tape may be stored as a file of a file system, e.g., the Data Domain file system (DDFS). As such, the VTLs are generally able to take advantage of advanced file system functionalities, such as improved ability to perform data deduplication, replication, redundancy, segmenting, and/or other file system based technologies.
In the case where backup data is stored to a file system, the VTL interfaces with the file system through a file system interface. A conventional file system is a deeply pipelined system. Each component of the file system has resources (caches) to buffer and stage information as writes are processed. As long as resources are available, the file system operates with acceptable latencies. Here, latency refers to the time it takes for the file system to commit unstable data to stable storage and return a write status to the VTL. If resources, however, become unavailable in one or more phases of the file system, then latencies can temporarily become very long until resources become available. In the worst case, file system resources immediately become unavailable again, resulting in an extended period of poor latency.
The VTL interfaces with a file system as a black box. That is, the VTL is not aware of exactly which file system resources may be limited. The VTL is only aware that latencies may vary. For maximum performance, the VTL performs writes to a file system as uncommitted writes, and periodically issues a commit request to cause the file system to commit the unstable data, i.e., store the unstable data to stable storage. Typically, the amount of unstable data written between each commit request is 16-128 megabyte (MB). Because a commit request causes unstable data to move through the file system pipeline, it is sensitive to the available resources in the file system. In particular, in cases of constrained resources, the time to service a commit request can be very extensive (e.g., tens to hundreds of seconds). Typically, VTL multiplexes multiple streams of backup data into a single file system connection. This means that a potential exists that the total time to process a commit request could be several minutes. Such a long latency is problematic because it could cause the VTL to timeout.
FIG. 1 is a block diagram illustrating a write process. FIG. 1 illustrates, by way of example, data processing system 108 writing Small Computer System Interface (SCSI) data 109 to VTL 111 to backup data. VTL 111, in response, translates SCSI data 109 into Remote Procedure Call (RPC) data and writes the RPC data to file system 114. For example, VTL 111 writes RPC data 110 and RPC commit request 115 to file system 114. In response, file system returns RPC status 112 and 113 to VTL 111, indicating the status of RPC data 110 and RPC commit 115, respectively. As illustrated in FIG. 1, there is a latency 105 from the time RPC commit 115 is transmitted by VTL 111 to when RPC status 113 is received by VTL 111. Latency 105, in some instances can be quite long as described above, which can lead to VTL 111 timing out.