Solid State Drives (SSDs) using flash memories have become a viable alternative to Hard Disc Drives (HDDs) in many applications. Such applications include storage for notebook, tablets, servers and network-attached storage appliances. In notebook and tablet applications, storage capacity is not too high, and power and or weight and form factor are key metric. In server applications, power and performance (sustained read/write, random read/write) are key metrics. In network-attached storage appliances, capacity, power, and performance are key metrics with large capacity being achieved by employing a number of SSDs in the appliance. SSD may be directly attached to the system via a bus such as SATA, SAS or PCIe.
Flash memory is a block-based non-volatile memory with each block organized into and made of various pages. After a block is programmed into the flash memory, it must be erased prior to being programmed again. Most flash memory require sequential programming of pages within a block. Another limitation of flash memory is that blocks can only be erased for a limited number of times, thus, frequent erase operations reduce the life time of the flash memory. A flash memory does not allow in-place updates. That is, it cannot overwrite existing data with new data. The new data are written to erased areas (out-of-place updates), and the old data are invalidated for reclamation in the future. This out-of-place update causes the coexistence of invalid (i.e. outdated) and valid data in the same block.
Garbage Collection (GC) is the process to reclaim the space occupied by the invalid data, by moving valid data to a new block and erasing the old block. But garbage collection results in significant performance overhead as well as unpredictable operational latency. As mentioned, flash memory blocks can be erased for a limited number of times. Wear leveling is the process to improve flash memory lifetime by evenly distributing erases over the entire flash memory (within a band).
The management of blocks within flash-based memory systems, including SSDs, is referred to as flash block management and includes: Logical to Physical Mapping; Defect management for managing defective blocks (blocks that were identified to be defective at manufacturing and grown defective blocks thereafter); Wear leveling to keep program/erase cycle of blocks within a band; Keeping track of free available blocks; and Garbage collection for collecting valid pages from a number of blocks (with a mix of valid and invalid page) into one block and in the process creating free blocks are examples of block management required to effectuate writing and programming of flash memory. Flash block management requires maintaining various tables referred to as flash block management tables (or “flash tables”). These tables are generally proportional to the capacity of SSD.
Generally, the flash block management tables can be constructed from metadata maintained on flash pages. Metadata is non-user information written on a page. Such reconstruction is time consuming and generally performed very infrequently upon recovery during power-up from a failure (such as power fail). In one prior art technique, the flash block management tables are maintained in a volatile memory, and as mentioned, the flash block management tables are constructed from metadata maintained in flash pages during power-up. In another prior art technique, the flash block management tables are maintained in a battery-backed volatile memory, utilized to maintain the contents of volatile memory for an extended period of time until power is back and tables can be saved in flash memory. In yet another prior art technique, the flash block management tables are maintained in a volatile random access memory (RAM), the flash block management tables are periodically and/or based on some events (such as a Sleep Command) saved (copied) back to flash, and to avoid the time consuming reconstruction upon power-up from a power failure additionally a power back-up means provides enough power to save the flash block management tables in the flash in the event of a power failure. Such power back-up may comprise of a battery, a rechargeable battery, or a dynamically charged super capacitor.
Flash block management is generally performed in the solid state drive (SSD) and the tables reside in the SSD. Alternatively, the flash block management may be performed in the system by a software or hardware, commands additionally include commands for flash management commands and the commands use physical addresses rather than logical addresses. An SSD with commands using physical addresses is referred to as Physically-Addressed SSD. The flash block management tables are maintained in the (volatile) system memory.
A storage system (also referred to as “storage array”, or “storage appliance”) is a special purpose computer system attached to a network, dedicated to data storage and management. The storage system may be connected to Internet Protocol (IP) Network running Network File System (NFS) protocol or Common Internet File System (CIFS) protocol or Internet Small Computer System (iSCSI) protocol or to a Storage Area Network (SAN) such as Fiber Channel (FC) or Serial Attached SCSI (SAS) for block storage.
These storage systems typically provide one or two network ports and one or more external network switches are required to connect multiple hosts to such systems. External network switches are costly and take rack space in the space constraint data centers.
There are also substantial latencies and processing associated with the above mentioned protocols which makes the storage system slow to respond.
In a storage system employing physically-addressed SSD which maintains the flash block management tables on the system memory that has no power back-up means for the system and no power back-up means for the system memory, the flash block management tables that reside in the system memory are lost and if copies are maintained in the flash onboard the SSD, the copies may not be updated and/or may be corrupted if power failure occurs during the time a table is being saved (or updated) in the flash memory.
Hence, during a subsequent power up, during initialization, the tables have to be inspected for corruption due to power fail and, if necessary, recovered. The recovery requires reconstruction of the tables to be completed by reading metadata from flash pages and results in further increase in delay for system to complete initialization. The process of complete reconstruction of all tables is time consuming, as it requires metadata on all pages of SSD to be read and processed to reconstruct the tables. Metadata is non-user information written on a page. This flash block management table recovery, during power-up, further delays the system initialization, the time to initialize the system is a key metric in many applications.
Yet another similar problem of data corruption and power fail recovery arises in SSDs and also Hard Disc Drives (HDDs) when write data for write commands (or queued write commands when command queuing is supported) is cached in a volatile system memory and command completion issued prior to writing to media (flash or HDD). It is well known in the art that caching write data for write commands (or queued write commands when command queuing is supported) and issuing command completion prior to writing to media significantly improves performance.
Additionally, file systems and storage systems employ journaling or logging for error recovery, the journal or log associated with a command or commands is saved in a persistent storage. In the event of a power fail or system crash or failure, the journal or log is played back to restore the system to a known state.
As mentioned before, in some prior art techniques, a battery-backed volatile memory is utilized to maintain the contents of volatile memory for an extended period of time until power returns and tables can be saved in flash memory.
Battery backup solutions for saving system management data or cached user data during unplanned shutdowns are long-established but have certain disadvantage including up-front costs, replacement costs, service calls, disposal costs, system space limitations, reliability and “green” content requirements.
Additionally, storage systems suffer from becoming inoperable upon encountering a single point of failure. If a component within the storage system fails, the data in the storage system becomes unavailable to the servers until it is serviced.
What is needed is a storage system that reliably operates even in the face of a point of failure.