1. Field of Invention
The present invention generally relates to the field of mass data storage in computing. It more specifically provides techniques for storing and retrieving data with improved speed and efficiency.
2. Discussion of Related Art
An essential feature implemented nowadays in mass storage systems comprises the creation of Snapshot copies of essential data units—for example, logical units (“LU”s) in storage area networks (“SAN”) and File Systems in network-attached storage (“NAS”) systems. Several techniques are used in creating such copies and several names are used to designate them: snapshot, instant copy, clones. The basic idea behind any such implementation is that at a given point in time a command is issued in the system and a copy is immediate created of the LU. This copy is intended to reflect the situation in the LU at the time of creation. The source LU continues to respond to input-output (“IO”) activity as usual. Depending on the implementation, the copy may either remain unchanged after its creation (and thus continue to reflect the status of the source at the time of creation) or be available for use on its own, thus acting as a standard LU in all respects that can be modified at will by the user. If the copy remains unchanged, it may be used to restore the source LU to its original state at the time of creation.
Typically, the idea behind implementations of snapshot copies is that when the command is issued and the copy created, very little actual activity is performed in the system. In most implementations, metadata has to be created for internal management purposes, the size of the metadata being proportional to the intended size of the copy. Thus, even though very brief, the creation time of a snapshot is proportional to the size of the intended copy. An alternative to this is to create snapshot copies that require less metadata at the time of creation. Such an implementation is described in U.S. patent application Ser. No. 11/123,993, titled “Data Storage Methods for Hierarchical Copies,” filed May 6, 2005.
The real data copying activity takes places not at the time of creation of snapshots, but rather whenever data is written to the source LU or to the copy. Indeed, LUs typically comprise sequence of data blocks, the sequences being of varying lengths and the blocks being of equal sizes. Managing the LU within the system is typically done in terms of partitions comprising a fixed number of blocks. At the time of creation of the snapshot, the source LU (“LUS”) and the target LU (“LUT”) share all data, and no physical copy of the data is actually created. If the user wants to read data from LUT, an internal system of pointers that is transparent to the user will indicate that this data has to be read from the partition which contains the original data and is associated with LUS. If at some point in time data is written for the first time to a partition in LUS, the system will create a new physical partition where this data is written and this modified partition is then associated to LUS, whereas the original partition remains associated with LUT. This basic mechanism known as “copy on write” is the basic mechanism typically implemented to allow the correct management of snapshots with minimal creation of physical data. After this step, two partitions exist in the system: the original one remains associated with LUT and continues to reflect the state of data in LUS at the time of establishing the copy, and the new data is in the newly created partition and it is associated with LUS that continues to work as usual. New modifications of this partition will not affect LUT anymore. However, since new partitions are created only when the associated data is modified, in the typical case only a small percentage of partitions exists both in the new and in the old version, whereas much of the data continues to be shared by LUS and LUT via pointers.
The step of “copy on write” is then the stage where most of the actual copy activity takes place. Whereas snapshot creation involves virtually no overhead activity and thus the overall activity parameters of the system are virtually unaffected at the time of snapshot creation, each “copy on write” activity involves a considerable latency penalty for the individual IO request involved. Thus, for instance, let PSS be a partition associated with LUs and composed of a sequence of blocks that is to be copied as part of a “copy on write” process to PTT, associated with LUT. Assume a write request is now received from the host, involving one or more blocks BB that is to be written to PSS, so as to modify it for the first time and to create an actual copy of it. Let PTT be the partition that is created and will be associated with the data that was associated with PSS before the operation. In order to perform this request the cache typically performs the following steps: (1) read from disk the entire partition PSS; (2) create a copy the data of PSS and associate it to PTT; (3) write BB to PSS according to the request. Thus, whereas write requests are usually serviced immediately by the storage system, under a “copy on write” situation the request had to wait until read task (1) was completed before it could complete and acknowledge the write request for block(s) BB.
Whatever the precise merits, features, and advantages of the above-mentioned techniques, none of them achieves or fulfills the purposes of the present invention.