The present invention relates to data storage systems, and more particularly, this invention relates to copying fragmented files between sequential storage mediums.
As the name suggests, data is stored on a sequential storage medium in a sequential fashion. Accordingly, as data is written to a sequential storage medium, it is appended to the end of whatever data has already been written on the medium. As data stored on a sequential storage medium is updated over time, the updates are also appended to the end of whatever data has already been written on the sequential storage medium, as opposed to actually replacing (overwriting) the previous and now obsolete version of the data. As a result, files stored on the sequential storage medium become fragmented as portions of the files are updated over time. Thus, despite originally having been written sequentially and in series, the data corresponding to a given file may be spread across the sequential storage medium over time.
While read and/or write operations may still be successfully performed on a sequential storage medium on which file data has experienced fragmentation, the process of copying fragmented data between sequential storage mediums faces significant setbacks. Data may be copied between sequential storage mediums for a number of different reasons, e.g., such as upgrading the quality of the sequential storage medium and/or reclaiming a particular sequential storage medium.
However, conventional products have been unable to efficiently perform such copying of fragmented data between sequential storage mediums. Specifically, conventional products read each portion of each file individually before copying the corresponding file from one sequential storage medium to another. Thus, the amount of time associated with copying data from one sequential storage medium to another is significantly lengthened as the number of fragmented files on the source sequential storage medium increases.
In an effort to avoid these latency spikes when copying data from one sequential storage medium to another, attempts have been made by conventional products to use a cache as a staging area for the data on a sequential storage medium before it is written to a second sequential storage medium. Although these attempts reduced latency, they also significantly increased operating costs, as cache is significantly more expensive per unit of data than sequential storage media. As the storage capacity of sequential storage media continues to increase, these conventional attempts require a cache which has a large enough capacity to store all of the data on a corresponding sequential storage medium. For example, the storage capacity of magnetic tape is currently upwards of 15 TB. Thus, these attempts ultimately result in decreased efficiency and even degraded performance of the cache.
In sharp contrast to these shortcomings experienced by conventional products, various approaches described herein are able to reduce latency while copying data between sequential storage mediums, while also minimizing data consumption.