This invention relates to storage systems, and in particular to techniques for improving the performance of copy operations between volumes in such storage systems.
Large organizations throughout the world now are involved in millions of transactions which include enormous amounts of text, video, graphical and audio information. This information is being categorized, stored, accessed, and transferred every day. The volume of such information continues to grow. One technique for managing such massive amounts of information is the use of storage systems. Conventional storage systems include large numbers of hard disk drives operating under various control mechanisms to record, mirror, remotely backup, and reproduce this data. The rapidly growing amount of data requires most companies to manage their data carefully with their information technology systems, and to seek high performance within such systems.
One common occurrence in the management of such data is the need to copy it from a primary system to a secondary system. Such copies are often made to provide redundancy for the data, thereby enabling retrieval of the data if events at the primary storage system preclude accessing the data, or destroy the data. Maintaining copies of the data at a remote site helps assure the owner of the data that the data will be available, even if there are natural disasters or unexpected events at the primary site. By having stored the data in a remote location, protection is also provided in the event of failures in the primary storage system. Should an event occur at the primary site, the data from the secondary site can be retrieved and replicated for use by the organization, thereby preventing data loss and precluding the need to recreate the data, at commensurate cost and delay.
Typically the data at the secondary (or remote) site is provided to that site via a communications network which is either dedicated to the transmission of data between the primary storage system and the remote storage system, via the internet, or by some other means. One method for copying the data from the primary storage to the secondary storage is to read the data at the primary site from the server connected to the primary storage, and then send that data from the server to the secondary storage. This method requires the server to handle very heavy loads for the copy operation, and causes heavy network traffic, leading to copy performance degradation.
One known method for reducing the work load on the server and/or the network is to copy data directly from the primary storage system to the secondary (or remote) storage system. In a typical implementation, a source volume (at the primary storage) needs to copied to the secondary storage. To achieve this, another volume, called a target volume is prepared in the secondary storage, and all of the data in the source volume is copied to the target volume. One problem with this approach is that even if only a small part of the volume is occupied with actual data, the entire volume needs to be copied. One solution to that issue is described in U.S. published patent application 20030163553 A1. This publication discloses a method which uses meta-data from the file system, for example an i-node table, to determine the address of the actual data to be copied. Thus, this method reduces the time required to complete the copy, because it is not necessary to copy data which are not included in the meta-data. Unfortunately, however, if the actual data is fragmented on the source disk, the scattering of the data around the disk may cause delays in copy performance due to the relatively long seek time to locate the disk read-write head at the address of the target data.
What is needed is an improved technique for copying data which overcomes the delays of the seek time for the hard disk drive head, yet copies only actual data.