The amount of data being stored continues to increase. The importance of the data being stored also continues to increase. Thus, it follows logically that the number and types of devices for storing data has expanded to accommodate the increasing amounts of ever more important data. Data storage choices now include disks, tapes, collections of disks (e.g., redundant arrays of independent disks (RAID)), tape libraries, network attached storage (NAS), solid state drives (SSD), and other devices. Having more devices to store data makes it more likely that a user will find an appropriate device for storing their data. However, the ability to integrate devices so that data can be seamlessly moved from device to device varies inversely with the number and types of devices involved.
As the amount of data being stored continues to increase, efforts to reduce redundancy in stored data also continue to increase. De-duplication is one method for addressing redundancy in stored data. As the amount of data being stored continues to increase, other data management efforts are also increasing. For example, users may want to ensure that there are secure backups of their data, both on site and off site. Users may want to ensure that certain portions of their data are replicated to various locations throughout their enterprise. Users may want some data to be available in a first, faster time frame while other data is allowed to be available in a second, slower time frame. Decisions made concerning issues like reducing redundancy, increasing security, and manipulating availability may be guided, at least in part, by costs associated with these actions.
Thus, as the amount of data being stored increases, as the options concerning what can be done with the stored data increases, and as the costs associated with those options vary, being able to move data between storage devices has become more interesting. However, at times, providers of devices may have only addressed moving data from their device to some other specific device or sets of devices. This may have made it difficult to move data from their device to some other device for which they had not provided a driver or other mechanism. Additionally, some providers of devices may have produced proprietary systems that made it difficult, if even possible at all, to move data between devices.
One place where users store data is referred to as a network attached storage (NAS) device. An NAS device may provide file system based file-level data storage for a network. The file-level data storage may facilitate providing data access to heterogeneous clients on the network. Since it is designed to interact with a network, an NAS device may operate as a file server for a network. An NAS device may facilitate file sharing between multiple computers on a network. NAS devices may be, for example, networked appliances that contain one or more hard disk drives. These disk drives may be arranged in different RAID configurations. In one example files on NAS devices may be accessed using file sharing protocols like NFS (network file system) or CIFS (common internet file system).
NFS is a distributed file system protocol. NFS is typically a UNIX or LINUX based approach to provide a client computer access to files on a server computer. NFS attempts to make the remote access across the network operate similarly to how local access to a local device would work. CIFS is typically a non-UNIX based approach that provides functionality similar to NFS. CIFS may also be referred to as SMB (server message block) because it supports a client/server based approach to providing shared access to files. There are many examples of backup appliances, servers, or services that have been configured to access files on NAS devices using NFS and CIFS protocols.
Another place where users store data is on a physical tape. The physical tape may be read or written when placed in a physical tape drive. Some devices (e.g., tape library) may include multiple physical tape drives and physical tapes. Physical tapes can be inserted into and removed from different physical tape drives at different times. Different physical tape manufacturers and physical tape drive manufacturers may have taken their own approach to storage without regard to what other providers were doing. Over time, however, some standards arose that made it more likely that a tape could be used in different tape drives. The LTO (linear tape open) standard is an example of these standards.
LTO describes magnetic tape data storage technology that has evolved as a standard. The standard evolved partially as an alternative to propriety magnetic tape formats that made it more difficult if even possible at all, to use a tape created by one backup application in different devices with different backup applications. LTO has progressed through several revisions. In 2011, LTO-5 described a standard for storing up to 1.5 TB on a cartridge. LTO-5 also described a partition feature that allows tapes to be split into two separately addressable and writeable areas.
The NAS devices and tape devices described above illustrate how, over time, users have gained access to more and more types of devices for storing their data. While these multiple options have provided greater flexibility, they have also created integration and migration issues. One integration and migration issue concerns how to efficiently or practically move data from a disk-based NAS device to a tape-based backup device. One approach for moving data between disk-based NAS devices and tape-based devices involved the network data management protocol (NDMP). NDMP provided a framework for one solution for moving NAS-based data to a tape device by providing two different services, a data service (disk support) and a tape service (tape support). If a suitable bridge could be found between the two different services, then data sets could be moved.
NDMP is an open standard protocol whose two different services theoretically provide the bridge. An NDMP data service produces an NDMP data stream in a specified format between a disk and the NDMP server and an NDMP tape service produces an NDMP stream in a specified format between a tape and the NDMP server. The tape service and the data service have separate state machines and may operate independently. NDMP theoretically facilitates having heterogeneous network file servers communicate directly with both NAS devices (disk) and network attached tape devices. However, the theory may break down because disks and tapes store data in fundamentally different ways using fundamentally different mechanisms that make it difficult to perform the theoretical transfer of data from one to the other using NDMP.
While NDMP theoretically facilitates moving data from an NAS device to a tape device, there is a missing link because there is no Rosetta stone to bridge the gap between an NFS/CIFS file system perspective of data stored on an NAS disk based device or the NDMP data stream perspective of data received from an NAS device and the linear data set perspective of data stored on a tape device. Thus, although the lines of communication between devices were opened by an NDMP data service dealing with data streams between an NAS device and a backup server and an NDMP tape service dealing with data streams between the backup server and a tape device, a disconnect between perspectives on data organization still existed.
With the two NDMP data streams available, some apparatus and methods bridged the disconnect from disk to tape through the intermediary of a virtual tape library (VTL). A VTL presents one physical storage component (e.g., disk) as another logical storage component (e.g., tape library, tape). Thus, if a conversion from the disk-based NAS perspective to the tape-based VTL perspective could be made, and if a conversion from a VTL perspective to an actual tape library perspective could be made, then data could be copied from the NAS to the VTL and then from the VTL to an actual tape library. However, since a VTL is still presenting tape, this approach still reduces to the same disconnect present in other approaches. If a user is able to access a disk based NAS volume on one side of the VTL bridge using, for example, an NDMP data service, and is able to mount a tape volume on the other side of the VTL bridge using an NDMP tape service, then a user may be able to copy a data set from an NAS device to a tape device through the intermediary of a VTL that has access to two NDMP services. However, more efficient and more generalized approaches are sought.