Solid state drives can include non-volatile solid-state memory, such as flash memory. Flash memory can include an improved form of Electrically-Erasable Programmable Read-Only Memory (EEPROM). Traditional EEPROM devices are only capable of erasing or writing one memory location (e.g., a memory cell) at a time. In contrast, flash memory allows multiple memory locations to be erased or written in one programming operation. Flash memory can thus operate at a higher speed compared to traditional EEPROM.
Solid-state memory has a number of advantages over other storage devices. For example, it generally offers faster read access times and better shock resistance than a hard disk drive (HDD). Unlike dynamic random access memory (DRAM), solid-state memory is generally non-volatile, meaning that data stored in flash memory is not lost when power to the memory is removed. These advantages, and others, may explain the increasing popularity of flash memory for storage applications in devices such as memory cards, USB flash drives, mobile phones, digital cameras, mass storage devices, MP3 players and the like. As with any storage systems, data stored in flash memory may be prone to various forms of undesired destruction. Traditional techniques for preventing data loss include file versioning or data backup to a remote server.
Data protection and recovery is an important feature of any storage systems. Data stored in storage systems can be inadvertently lost through a variety of unforeseen events, such as storage hardware failures and power outage. Also, data can be prone to hacks and virus attacks from third parties, and even unintended instructions from authorized users (e.g., unintended “delete” instruction.) Therefore, it is important to protect data from such unforeseen events and allow computing systems to go back to a healthy state after such events and perform back-in time execution (BITE) to rerun transactions for fault, failed, and/or incomplete transactions.
Storage systems can perform data protection and recovery using snapshots. A snapshot is a collection of data maintained by a storage system at a particular time instance. A storage system that uses snapshots for data recovery can maintain two storage volumes: a source volume and a snapshot volume. Depending on the snapshot technique, as discussed below, the source volume can maintain the actual current data in use and the snapshot volume can maintain the snapshot data for recovering data in case of data loss in the source volume. In some cases, the source volume can reside at a production site (which may include a server host and a production storage device), and the snapshot volume can reside at a production site or a backup site (which may include a backup server and a backup storage device) or both. The production site and the backup site are connected to one another by a communication system such as a network.
In some cases, a storage system can protect against data loss using a full-clone snapshot. A full-clone snapshot includes a copy of the entire data maintained by a storage system at a particular time instance. While a full-clone snapshot is effective in recovering data as maintained at the particular time instance, the full-clone snapshot is time-consuming to generate and space-consuming to store. Therefore, a storage system may generate the full-clone snap shot only periodically, for example, weekly, daily or hourly depending on reliability and performance requirement.
In some cases, a storage system can generate and maintain a copy-on-write (COW) snapshot. The COW snapshot of a particular time instance includes an image of memory blocks whose values have changed since the last COW snapshot was taken or declared. For example, when a memory controller receives a write request for a memory block for the first time after a snapshot is declared, the memory controller can cause a copy of prior data to be made in the snapshot volume. Therefore, when a memory block is overwritten for the first time after a snapshot is declared, the storage system performs two write operations and one read operation: one read operation for reading the prior data from the source volume, one write operation for writing the prior data in the snapshot volume, and one write operation for writing the new data in the source volume. The COW snapshot allows data to be recovered to snapshot time instances (e.g., time instances at which the snapshot is declared), and can save storage space and back-up time compared to full-clone snapshots.
In some cases, a storage system can generate and maintain a redirect-on-write (ROW) snapshot. The ROW snapshot is similar to a COW snapshot. However, when a memory controller receives write requests for a memory block after a snapshot is declared, the memory controller redirects them to a snapshot volume, leaving the memory block in the source volume intact. Therefore, the source volume keeps data as maintained at the snapshot time instance (e.g., the time instance at which the snapshot was declared); and the snapshot volume maintains changes to the memory blocks in the source volume since the snapshot time instance. Unlike the COW scheme, under the ROW scheme, the memory controller does not need to perform a separate copy operation to copy the prior data from the source volume to the snapshot volume, thereby reducing the number of write operations performed by the memory controller. However, under the ROW scheme, the memory controller does need to merge memory blocks upon receiving a read request. For example, when a memory controller receives a read request, the memory controller should determine whether the requested data is stored in the source volume or the snapshot volume.
Because COW snapshots and ROW snapshots can be generated easily, the COW and ROW snapshot operations can be performed more frequently compared to the full-clone snapshot operation. Furthermore the COW snapshots and ROW snapshots often require smaller storage space compared to the full-clone snapshot because the COW snapshots and ROW snapshots store only memory blocks that have changed value since the last snapshot time instance.
For data recovery, the snapshots can be located at either the backup site or the production site (or both). As long as the snapshots can be retrieved reliably, the actual location of the snapshots is of lesser importance because data recovery often does not require low latency. For this reason, full-clone snapshots are often stored at a backup site because full-clone snapshots tend to require a large storage space and such a large storage space may be too expensive to accommodate at the production site. Also, backup sites are often more reliable and secure than the production site. Therefore, full-clone snapshots can be stored more reliably at the backup site. Because the COW snapshots and ROW snapshots need smaller space to store, they can be stored at both the production site and the backup site for data recovery purposes.
For back-in-time execution (BITE) (e.g., execution of a transaction based on data that was present at the snapshot time instance), the location of the snapshot can become important because in general, BITE often requires low latency. Therefore, to accommodate BITE, it is desirable to store the snapshots at the production site. Unfortunately, because the full-clone snapshots often require a large storage space, it is difficult to store the full-clone snapshots at the production site. However, since the COW snapshots and ROW snapshots only require small storage space, maintaining COW snapshots and ROW snapshots at the production site is feasible. Therefore, storage systems can store COW snapshots and ROW snapshots at the production site to facilitate BITE.
In some cases, a storage system could use two or more full-clone snapshots, a COW snapshot, and/or a ROW snapshot to improve the data recovery performance and the BITE performance. However, a storage system is limited by the storage capacity dedicated to the snapshots. Therefore, a storage system is designed to improve the data recovery performance and the BITE performance with the constraint on the available storage capacity for snapshots.
Even with the use of multiple snapshots, traditional snapshot schemes do not readily guarantee data recovery when the backup site is down. This is problematic because the backup site is not fail-proof. For example, the backup site is usually located at a data center. When the data center is experiencing a power outage, then the snapshots stored in the data center may not be available for restoring lost data at the production site.
Also, snapshot techniques for flash-based storage systems is further constrained by the fact that write operations in flash devices are expensive. Flash-based storage systems often strive to reduce the number of write operations. Therefore, there is a need for snapshot techniques for flash-based storage systems designed to reduce the number of write operations for storing snapshots and recovering data based on the snapshots.