A storage system is a processing system adapted to store and retrieve data on storage devices (such as disks). The storage system includes a storage operating system that implements a file system to logically organize the data as a hierarchical structure of directories and files on the storage devices. Each file may be implemented as a set of blocks configured to store data (such as text), whereas each directory may be implemented as a specially-formatted file in which data about other files and directories are stored. The storage operating system may assign/associate a unique storage system address (e.g., logical block number (LBN)) for each data block stored in the storage system.
The storage operating system generally refers to the computer-executable code operable on a storage system that manages data access and access requests (read or write requests requiring input/output operations) and may implement file system semantics in implementations involving storage systems. In this sense, the Data ONTAP® storage operating system, available from NetApp, Inc. of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL®) file system, is an example of such a storage operating system implemented as a microkernel within an overall protocol stack and associated storage. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
A storage system's storage is typically implemented as one or more storage volumes that comprise physical storage devices, defining an overall logical arrangement of storage space. Available storage system implementations can serve a large number of discrete volumes. A storage volume is “loaded” in the storage system by copying the logical organization of the volume's files, data, and directories, into the storage system's memory. Once a volume has been loaded in memory, the volume may be “mounted” by one or more users, applications, devices, and the like, that are permitted to access its contents and navigate its namespace.
A storage system may be configured to allow server systems to access its contents, for example, to read or write data to the storage system. A server system may execute an application that “connects” to the storage system over a computer network, such as a shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. The application executing on the server system may send an access request (read or write request) to the storage system for accessing particular data stored on the storage system.
The storage system may typically implement large capacity disk devices for storing large amounts of data. In conjunction with the large capacity disk devices, the storage system may also store data on other storage devices, such as low-latency random read memory (referred to herein as “LLRRM”). When using LLRRM devices in conjunction with disk devices to store data, the storage system may map storage system addresses (e.g., LBNs) to LLRRM addresses to access data on the LLRRM devices. As densities of LLRRM devices (e.g., flash memory) increase to provide larger storage capacities (while prices of LLRRM devices continue to decrease), LLRRM devices are being integrated into applications demanding such higher capacities.
Typically, large capacity LLRRM devices incorporate multiple banks of discrete memory devices, each bank being simultaneously accessible in parallel. At the same time, the multiple banks are also typically concatenated or otherwise organized to operate as a single memory device of greater capacity. Each bank may also comprise a plurality of memory chips, each chip also being simultaneously accessible in parallel. Each chip may also comprise a plurality of erase units, each erase unit (EU) comprising a plurality of pages for storing data. A page may comprise the smallest data unit that can be read or written on the chip. The EU may comprise the smallest data unit that can be erased on the chip, whereby the entire EU may need to be erased before re-writing to any page in the EU.
To reduce latency in accessing data on the LLRRM device, received data blocks to be stored to the LLRRM device may be striped across several chips of the same bank. In data striping, a received data block may be sub-divided into data sub-blocks and the data sub-blocks stored to the multiple chips of the same bank for maximizing use of the parallel accessibility feature of the chips to produce faster read and write times. Conventionally, data is striped using “aligned” stripes, whereby the data sub-blocks of a received data block are stored to the same EUs number and the same page number of each chip in the same bank. This may simplify and reduce the mapping data needed to map the storage system addresses (e.g., LBNs) to the LLRRM address where the received data block may be read from on the LLRRM device. For example, to later read the data block from the LLRRM device, the mapping data may only comprise a single bank number, a single EU number, a single page number, and a single page offset number, (since the EU, page, and offset numbers will be the same for each chip).
However, use of “aligned” data striping may cause a faster rate of reduction in the useable storage capacity of the LLRRM device. When a defective EU (i.e., an EU that no longer performs erase, write, or read operations) is detected in a chip of the LLRRM device, to maintain aligned data striping, the entire row of EUs across the remaining chips may also be declared defective (the row of EUs comprising EUs in the remaining chips having the same EU number as the defective EU). The EUs in the remaining chips may be declared defective (and no longer be used to store data) even though they are in fact functional to maintain the aligned stripes needed in conventional data striping. Over time as more defective EUs are detected and more rows of EUs are declared defective, the useable storage capacity of the LLRRM device may be significantly reduced.