As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
In this regard, RAID, an acronym for Redundant Array of Independent Disks, is a technology that provides increased storage functions and increased reliability through redundancy, and as such may be beneficially employed in information handling systems. Redundancy in a RAID device may be achieved by combining multiple disk drive components, which may include one or more disks of different type, size, or classification, into a logical unit, where data is distributed across the drives in one of several ways called “RAID levels.” The data distribution determines the RAID type, e.g., RAID 0, RAID 5, RAID 10, etc.
RAID includes data storage schemes that can divide and replicate data among multiple physical disk drives. The physical disks are said to be in a RAID array, which is addressed by the operating system as one single disk. Many different schemes or architectures of RAID devices are known to those having ordinary skill in the art. Each different architecture or scheme may provide a different balance among various goals to be achieved in storing data, which include, but are not limited to, increased data reliability and increased input/output (hereinafter “I/O”) performance.
Some storage systems are able to support data instant replay by taking point-in-time copies (PITCs) or “snapshots” of data at a point in time, and a logical volume may point to such point-in-time copy, such that by accessing the volume, one may access the point-in-time copy. A logical volume is used like a remote disk by an initiator, typically a computer, such as a server.
The conventional way to support a snapshot storage system is to collect the written data extents into a PITC, which is a data structure that may be referenced by a volume. A data extent is a collection of data bytes characterized by a block address and a length. The block address defines a location on a disk (e.g., a virtual disk or volume). The length is the number of bytes of the data extent. To implement a PITC, a device may write the volume extents written by the initiator during a specific period of time onto extents on a virtual disk as presented by RAID. The PITC database may include the relationship between the volume extents and the RAID extents (which may also be known as “chunks”).
The volume itself may be mapped to a chain of PITCs. The volume writes to a PITC at the head of the chain, which may be referred to as the active PITC. When a snapshot is taken, the active PITC is frozen and a new one is created and placed at the head of the PITC chain. This repeats each time a snapshot is taken. Reads to a volume traverse the chain in order looking for partial match VBAs and length. When a snapshot is deleted and assuming no volume cloning the corresponding PITC is deleted, the extents in the deleted PITC's map extents are coalesced (e.g., merged) into the next newest PITC. PITCs are typically implemented as variant B-Trees or other Key Value stores.
Such a traditional approach to PITC management has a number of major weaknesses. First, the number of PITC lookups scales linearly with the number of snapshots taken. Therefore, performance will suffer in a system with many snapshots. The typical means to address this scaling is to use approximate membership query filters, such as bloom filters. However, these structures may have scaling and complexity issues of their own. This lookup scaling is expensive for volumes that have frequently read read-only data. Over time, as snapshots are deleted due to aging, read-only data ends up at the end of a PITC chain. Thus a read for such data may require a traversal of the entire chain, and thus maximal overhead. This is particularly expensive when the read-only data is frequently read.
Second, coalesce operations may be so database intensive that they negatively affect performance. The PITC that corresponds with the oldest snapshot—assuming a typical delete oldest snapshot schedule—typically becomes much larger than all the other PITCs. It typically contains an entry for each extent that was ever written to the volume. Each time the oldest snapshot is deleted, this “bottom” PITC is bound to the next oldest snapshot. Thus, when a snapshot is deleted, extents in the next oldest snapshot are coalesced into the bottom PITC. If the bottom PITC is implemented as B-Tree and the next oldest snapshot has a roughly even distribution of entries, then each entry in the next oldest snapshot will cause a B-Tree node to be read and written. Since entries are small compared to the nodes, in many cases the coalesce could cause every leaf node of the bottom PITC to be updated. The typical means to address this is to delay the coalesce until many PITCs can be coalesced in batch into the bottom PITC. However, these batching operations may have scaling and complexity issues of their own.
Third, the traditional approach does not provide a natural way to determine which extents are referenced by the volume lookups and which bio are only referenced by snapshot lookups. Statistics typically require distinguishing these two types of space usage. To make this distinction requires an inspection of each extent of each PITC in the entire chain.
Fourth, the traditional approach may render it expensive to do tiering. Tiering involves optimal placement of a volume's extents on storage media based on expected load. Tiering requires copying written extents from one storage device to another. After the copy, the corresponding entries in the PITCs are updated to reflect the new location of the written extents. Tiering is usually done at some granularity in the volume's addressable space (LBA space). Written extents are typically recorded in the PITCs at the granularity at which user data is written, the mode sizes of which are 4 kB and 64 kB in many cases. The tiering granularity depends on the RAM resources required of the system to track and calculate usage and such tiering granularity is typically much larger by an order of magnitude of more than 1 MB. Therefore, when one tiering granular unit is moved, multiple written extents must be updated. If these written extents are spread over multiple PITCs, then this update requires an update to each PITC, which is expensive.
Fifth, the traditional approach may render it expensive to optimize the metadata to the user data. The most-recently used or “hottest” data is usually latency sensitive, and delays caused by metadata lookups make up much of the latency. The way to minimize the metadata latency is to organize the metadata such that all the metadata associated with the hottest data is in RAM. To save RAM, this may require the metadata associated with colder data to be stored to solid-state storage devices (SSD). In the above style, this requires each PITC to be split into two: one sub PITC for the hot data, one sub PITC for the cold data. Since the data is associated with many PITCs, the metadata is as well, which creates scaling challenges in separating the PITCs as described and moving the metadata around between the sub PITCs.