It is fairly obvious that data, in the process of being archived or transferred from one location to another, will pass through various phases where different operations such as compression, network transfer, storage, etc. will take place on it. There are essentially two approaches that can be taken when implementing such a transfer mechanism. One would be to split the archival process into sub-tasks, each of which would perform a specific function (e.g. Compression). This would then require copying of data between sub-tasks, which could prove processor intensive. The other method would be to minimize copies, and have a monolithic program performing all of the archival functions. The downside to this would be loss of parallelism. A third alternative would of course be to use threads to do these tasks and use thread-signaling protocols, however, it is realized that this would not be entirely practical since threads are not fully supported on many computing platforms.
Accordingly, it is highly desirable to obtain a high-speed data transfer mechanism implemented in software and developed for the needs of high speed and reliable data transfer between computers.
It is an object of the invention to disclose the implementation of the DataPipe in accordance with CommVault System's Vault98 backup and recovery product. While developing the DataPipe, it is assumed that data, as it moves from archiving source (backup client) to archiving destination (backup server as opposed to media), may undergo transformation or examination at various stages in between. This may be to accommodate various actions such as data compression, indexing, object wrapping etc. that need to be performed on data being archived. Another assumption is the data may be transmitted over the network to remote machines or transferred to a locally attached media for archival.
Both the sending and the receiving computers execute software referred to herein as the DataPipe. Although the DataPipe transfer mechanism to be described herein is operative as a key component of backup and recovery software product schemes, the DataPipe is not restricted to that use. It is a general purpose data transfer mechanism implemented in software that is capable of moving data over a network between a sending and a receiving computer at very high speeds and in a manner that allows full utilization of one or more network paths and the full utilization of network bandwidth. A DataPipe can also be used to move data from one storage device to another within a single computer without the use of a network. Thus, the DataPipe concept is not confined to implementation only in networked systems, but is operable to transfer data in non-networked computers as well.
Further, in the case of a networked system, the DataPipe, and variations thereof, can be used to perform storage operations such as backups, snapshots, incremental backups, incremental snapshots, archiving and migration of data over the network, whether the network comprises a local area network, storage area network or a wide area network. The data is read and transferred from a source information store. The blocks in which the data is stored are mapped to create a block mapping. The data and block mapping are transmitted to a storage device where the data is stored at in the same block order as it was stored in the information store according to the block mapping, as opposed to the blocks being stored out of order at a logical level. Individual blocks that are changed or added to the information store may be copied out, transmitted and stored as they change. These individual blocks are stored in the same order as they were stored in the information store, wherein each changed block replaces its older version, rather than by adding an additional logical block to represent the changes in the data.