The invention relates to data reading and data protection. In particular, the invention relates to backing up data for protection against loss.
Currently there are two basic methods of reading data from a storage disk for supply to secondary storage devices (e.g. tape drives). The first method is the xe2x80x9cfile-by-filexe2x80x9d method which comprises proceeding sequentially through the file directory for the disk and reading each file from the disk one by one in the order in which they are specified in the directory structure. The second method is the xe2x80x9cimagexe2x80x9d method which comprises reading the whole of the disk sequentially from its first sector, and then using the file system data to work out the locations of the files on the resulting back-up tape.
The file-by-file method can be very slow when files are randomly stored across the disk, since the disk drive has to mechanically move its read head from one area to another for each file and, if the files are small, then performance is limited by the random access speed of the disk drive. It is relatively quick to restore a disk from a file-by-file back-up since the files are a single contiguous area in the tape or other kind of secondary storage device.
The creation of an image back-up is always very fast as it involves reading large sequential blocks from the disk with little repositioning of the read head being required, so that performance is limited by the sequential speed of the disk which is usually higher than the performance of the tape drive or other secondary storage device used. However, with image back-up, there are problems associated with restoring the backed-up data to the disk, as the files may not be in a single contiguous area on the tape if they were not contiguous on the disk.
The file by file method is slow to back-up a disk and quick to restore it, whereas the image method is quick to back-up a disk but slow to restore it.
EP-A-0 767431 discloses a system for backing-up computer disk volumes by performing an image back-up of a primary storage medium and additionally creating a file index that is also stored on the secondary storage medium, in this case tape. The file index provides the portion of individual files on the secondary storage medium that allows files to be restored to be accessed from the secondary storage medium in as contiguous manner as possible. However, the performance of the restore is still less than a conventional file-by-file restore because the relatively slow secondary storage medium must be accessed to read the file index before a restore may actually be performed.
An aim of the invention is to provide enhanced data reading and/or data back-up.
According to one aspect, the invention provides a method of reading data from a storage medium using a reader moveable relative to the medium, the method comprising sorting a list of files into an order which reduces the movement of the reader relative to the medium for the reading of said files and reading the files from the medium in said order.
According to another aspect, the invention provides apparatus for reading data from a primary storage medium, comprising a reader moveable relative to the primary storage medium for reading the files from the primary storage medium and a sorter for sorting files to be read into an order which reduces the movement of the reader relative to the primary storage medium.
In one embodiment, the primary storage medium which is read is a disk, particularly a magnetic hard disk. Alternatively, the storage medium is, for example, a tape.
By sorting the list of files to be read from the primary storage medium in this way, the speed of the back-up can be much greater than the aforementioned file-by-file method since the files can be read from the storage medium without wasting time seeking the files in the storage medium. For example, in the case where the storage medium is a magnetic hard disk, the reading can be done with relatively few read head realignment operations. Although the reading process has a speed approaching that of the image method, it has the advantage that the files can be written into back-up storage with greater contiguity, and the position of each file in the back-up storage is known from the order into which the list was sorted for reading purposes. This is particularly advantageous as no index of files is required to be additionally stored on the back up, or secondary storage, medium to allow the position of the files to be determined. There is therefore no loss in performance in terms of restoring files as there would be if it was necessary to first search a file index before accessing the secondary storage medium.
In a preferred embodiment, the invention arrives at said order by sorting the files to be read into an order in accordance with their distance from an end location of the primary storage medium. This means that the files are read in the order in which they are to be found in the primary storage medium, thus reducing the need to reposition the reader relative to the medium.
Preferably, if a file to be read from the primary storage medium is fragmented, the file""s position in said order is determined by the position in the medium of the fragment representing the beginning of said file. This helps to optimise the speed of reading the list of files since, if the position of a fragmented file in the sorted order was represented by some other fragment, then upon reaching the fragmented file in the sorted list, the read operation would immediately have to skip from the fragment represented in the order to the fragment representing the beginning of the file, thus incurring a delay.
In one embodiment, it is determined whether the number of fragmented files in the primary storage medium exceeds a threshold. If the threshold is exceeded, then defragmentation of the files is attempted prior to reading them. Advantageously, this improves the speed of reading the files from the primary storage medium.
The invention can be used in the backing-up of data for data protection purposes, wherein the data, after having been read in the manner described above, is then written to back-up, or xe2x80x9csecondaryxe2x80x9d, storage. Preferably, the back-up storage is a tape drive.
Alternative terminology for primary and secondary storage medium is respectively online and offline storage.
The invention extends to a storage area network (SAN) including apparatus according to the invention for reading data from a storage medium. The invention also extends to apparatus according to the invention for reading data from a storage medium wherein the reading process is initiated by a Network Data Management Protocol (NDMP) format request.