1. Field of the Invention
The present invention relates to the storage of machine-readable data. More particularly, the invention concerns a method and apparatus for more efficiently copying source data to a log structured storage target by pre-configuring the target.
2. Description of the Related Art
With the increasing popularity of computers, users are faced with more data than ever to transmit, receive, and process. Data storage is also critically important to many applications. One popular data storage configuration is "log structured storage." Log structuring is one way to manage units of storage, such as data tracks in an array of magnetic "hard" disks.
With log structured storage, a storage controller classifies storage space as "space-in-use," "uncollected free space," and "collected free space." Space-in-use describes storage space that contains valid data. Uncollected free space describes storage space that does not contain valid data, but is nevertheless unavailable to store data. For example, if data records only occupy part of a logical unit (such as a "track"), the unoccupied part of that logical unit is uncollected free space. Although this space is unused, it is unavailable to store further data because data is stored in track-size segments regardless of whether the entire track is filled. Collected free space describes storage space that is available to store data. This kind of storage space, for example, may have been formerly occupied by valid data that has been deleted or otherwise released.
Typically, storage controllers use linked lists to keep track of the various types of log structured storage. For example, separate linked lists may be used to track space-in-use, uncollected free space, and collected free space. This approach to space accounting is beneficial to many users because it does not require much management overhead. In contrast, with non-log-structured configurations the storage system must be able to receive and process users'requests to allocate storage. This type of storage system first allocates storage of sufficient size to store data, and then stores the data in the allocated storage. Log structured storage systems avoid the need to allocate storage.
Instead of allocating storage in advance, log structured storage stores the data one logical unit at a time. For each logical unit of data to be stored, the storage controller first consults the "collected free space" list to identify a unit of available storage space, and then stores the data in the free space. When there is a small amount of data to write, or a large amount of collected free space, storage is completed rapidly. In many cases, the storage controller is able to maintain a sufficient amount of collected free space in advance by running a collector subprogram to identify suitable data storage and reclassify it as collected free space. This type of collection, called "off-line collection" herein, may be performed periodically, whenever uncollected free space exceeds a certain threshold, etc.
Despite the use of off-line collection, a situation can arise when the data to be written exceeds the collected free space. In this event, the storage controller invokes another collection procedure, referred to herein as "on-line collection." Namely, when there is no more collected free space, the storage controller performs the following steps for each storage track: (1) identifying a track of uncollected free space, (2) changing status of this track to "collected free space," (3) writing the data to the freed unit, and (4) changing the reused unit's listing to "space-in-use." Although on-line collection is beneficial from the standpoint of minimizing overhead, it incurs a significant delay, which may be too much for some users. Chiefly, users may experience excessive delays when there are many write operations to perform, but relatively little collected free space. One situation exemplifying this problem is a full volume copy, a task that copies an entire volume of data to a target storage, and therefore involves many write operations.
An example of this situation is illustrated in FIG. 1, which shows contents of a log structured storage during various stages of a full volume copy. At first, the log structured storage has the contents 100. The contents 100 include other data 102 (unrelated to the full volume copy), an existing version of the volume being copied 104, and some collected free space 106.
When the full volume copy operation begins, it first writes data of the new version to the free area 106, until this area is full. At this point, the device has contents 103, including the formerly-free area 108, now filled with one part of the volume being copied. At this point, the device is full. To continue the full volume copy, then, the on-line collector must be used to examine and collect storage space to make more collected free space. In particular, the on-line collection process is invoked for each track of source data to be stored. This involves searching the log structured array for uncollected free space, and then consolidating, moving, and otherwise reorganizing data to convert the uncollected free space into collected free space. For example, if two tracks are each half-full (i.e., half space-in-use and half uncollected free space), the on-line collection process might relocate data from both tracks together onto a single track, and list the address of the old track as collected free space.
This process continues until the entire volume has been copied, at which time the device has the contents 105. Specifically, the volume has been completely written, as shown by 108 and 110. The remainder 112 of the existing version 104 is then subject to eventual off-line collection, or possibly on-line collection if the storage controller writes further data before off-line collection is activated next.
Some users may find the scenario of FIG. 1 to be undesirable because of the time delay involved. The chief delay is incurred by the consolidating, moving, and reorganizing of data to convert uncollected free space into collected free space. Moreover, this process is invoked repeatedly since on-line collection is invoked for each track to be written. When the source data is sizeable and the collected free space is low, data storage efficiency is at its lowest level.
Consequently, the existing on-line collection process is not completely adequate for some applications due to certain unsolved problems, which ultimately slow the overall storage process.