Snapshot is a technique being used to capture the state of a logical volume at a given point in time. Typical snapshot techniques implement a shared copy of the data that is used by both the source (original) logical volume and the snapshot that was taken from the logical volume. Both the source logical volume and the snapshot can share the same physical addresses until the content of the logical volume (or of the snapshot, if the snapshot is writable) is updated.
When updating data of the source logical volume, there is a need to preserve the original content of the source logical volume that is still being used by the snapshot.
According to one technique, called copy-on-write, the content of the address range to be updated is copied to a new physical space before being overwritten by new data of the source volume, and the new physical space where the original content was copied to, from now on, is associated only with the snapshot and not shared with the source volume anymore.
According to another technique, called redirect-on-write, the updated content is written to a new physical space, and the mapping of the source logical volume, with regard to the address range of the updated content, is changed so as to address the new physical space, while the snapshot continues to refer to the old address range.
In either case, after the update, the source logical volume and the snapshot no longer share a physical copy of the changed blocks and both use different address mapping so as to refer to different contents.
Storage systems that implement virtualization layers employ data structures for mapping logical address ranges within the logical address space (of e.g., logical volumes), into physical storage space, which may be either the actual space in the storage device or space represented by a lower layer of virtualization.
The storage and retrieval of data units required address translation such as virtual address to physical address translation. The mapping data structure has a granularity—a smallest size of a data unit that can have its virtual address translated to a physical address. Such a data unit may be regarded as an atomic data unit.
A non-limiting example of a size of an atomic data unit is 64 Kbytes. In this case the mapping data structure may translate addresses of contiguous ranges of data units that are multiplicities of 64 Kbytes from a logical address space (e.g. a logical volume) to the physical storage space.
In a case where snapshots were taken from a source logical volume, writing to a certain logical address range of the logical volume, requires copying the content of the certain logical address range, so as to preserve the old (original) data for the snapshots that share this data.
In addition, it is required to add additional mapping information to the mapping data structure so as to map the certain logical address range to two different locations in the physical storage space. If before the writing, the logical volume shared the same content of the certain logical address range with its snapshots, and the mapping indicated that the logical address, e.g., denoted as LBA1, is mapped to physical address, e.g., denoted as PA1, then after the writing, the duplicated content occupies not only double disk space, but also additional mapping space so as to support two mappings: LBA1 to PA1 and LBA1 to a new allocated physical address.
The duplication of mapping information occurs whether a distinct mapping data structure is used per each of the snapshots and the volume or when all the snapshots and the source logical volume use a single mapping data structure (wherein each logical address range is mapped to a plurality of physical address ranges for different snapshots).
The granularity of the mapping may be much rougher than the requested writing size. For example, writing may be requested in blocks of e.g., 512 Bytes, while the mapping granularity may be of 64K bytes. Thus, writing 512 bytes to a portion of a data unit of 64K bytes that is shared with snapshots causes duplicating the whole data unit, instead of duplicating only the written block.
Upon writing to small address ranges (smaller than the size of mapping units) of a volume having snapshots, there is a need, to: (i) avoid duplicating an entire data unit; and (ii) avoid adding extra mapping information to a mapping data structure having a granularity that corresponds to the size of an atomic data unit for each snapshot.