A data storage device is a device for recording information. A storage device may hold information, process information, or both. Electronic data storage is storage which requires electrical power to store and retrieve that data. Electromagnetic data may be stored in either an analog or digital format on a variety of media. This type of data is considered to be electronically encoded data, whether or not it is electronically stored in a semiconductor device, since a semiconductor device was used to record it on its medium. Most electronically processed data storage media (including some forms of computer data storage) are considered permanent (non-volatile) storage, that is, the data will remain stored when power is removed from the device. In contrast, most electronically stored information within most types of semiconductor (computer chips) microcircuits are volatile memory, for it vanishes if power is removed.
A hard disk (HD) is one such example of a data storage device and is commonly referred to as a hard drive, hard disk, or fixed disk drive. A HD is a non-volatile storage device which stores digitally encoded data on rapidly rotating platters with magnetic surfaces. HDs were originally developed for use with general purpose computers. In the 21st century, applications for HD have expanded to include digital video recorders, digital audio players, personal digital assistants, digital cameras and video game consoles, and mobile phones. Also during this time, the need for large-scale, reliable storage, independent of a particular device, led to the introduction of embedded systems such as Redundant Array of Independent Disks (RAID) arrays, network attached storage (NAS) systems and storage area network (SAN) systems that provide efficient and reliable access to large volumes of data.
Random-access memory (usually known by its acronym, RAM) is a form of computer data storage and is embodied in various forms of different data storage devices. Today it takes the form of integrated circuits that allow the stored data to be accessed in any order. For example, a DIMM, or dual in-line memory module, comprises a series of dynamic random access memory integrated circuits. These modules are mounted on a printed circuit board and designed for use in personal computers, workstations, servers, or other equivalent electronic systems.
Flash memory is non-volatile electronic memory that can be electrically erased and reprogrammed and is also embodied in various forms of different data storage devices. It is a technology that is primarily used in memory cards and USB flash drives for general storage and transfer of data between computers and other digital products. It is a specific type of EEPROM (Electrically Erasable Programmable Read-Only Memory) that is erased and programmed in large blocks; in early flash the entire chip had to be erased at once. Flash memory costs far less than byte-programmable EEPROM and therefore has become the dominant technology wherever a significant amount of non-volatile, solid state storage is needed. Example applications include PDAs (personal digital assistants), laptop computers, digital audio players, digital cameras and mobile phones. It has also gained popularity in the game console market, where it is often used instead of EEPROMs or battery-powered SRAM for game save data. Flash memory is non-volatile and offers fast read access times (although not as fast as volatile DRAM memory used for main memory in PCs) and better kinetic shock resistance than hard disks. These characteristics explain the popularity of flash memory in portable devices.
A host controller, disk controller, storage device manager, etc. may connect a host system (the computer, the data processing system, or other electronic device) to the storage device. Host or disk controllers may contain electronics and firmware to execute and manage transactions between the host system and the storage device. A device driver, linked to the operating system may control the host or disk controller itself. Host or disk controllers may or may not be integrated into the storage device itself.
Storage devices may have trouble managing data that has previously been stored, or is waiting to be stored, but that has become lost—so called Logically Bad Blocks. Managing Logically Bad Blocks is useful for storage devices such as for example, storage adapters, spinning disk drives, solid state disk drives, etc. Managing Logically Bad Blocks may be of particular use to Flash based devices since a bad Flash segment can lose multiple blocks of data (e.g. a Flash segment may contain 64 512-byte blocks for non-continuous Logical Block Addresses (LBAs)).
Logically Bad Blocks may be created (i.e. data “lost”) in a variety of situations. Some situations are described below (a non exhaustive list):
Logically Bad Blocks may be created when individual non volatile pages used for Write Cache are lost, as detected by Basic Assurance Tests or Power On Self Tests at Initial Program Load time.
Logically Bad Blocks may also be created in a compressed Write Cache where decompression errors are detected on the Write Cache destage operation, where the destage is either a normal destage or a stripe write destage.
Logically Bad Blocks may also be created if a Backup Cache Directory is kept in NVRAM and where a Backup Cache Directory is used to create bad blocks due to the loss of the Primary Cache Directory and Data.
Logically Bad Blocks may also be created if a block of user data is found unreadable from disk (i.e. a Data Check condition exists) and it is not possible to recreate the data utilizing RAID. This may occur, for example, when RAID-0 is used, when a RAID-5 array has another disk in the array which is failed, when a RAID-6 array has 2 other disks in the array which are failed, etc.
Logically Bad Blocks may also be created if a resynchronization of parity is required and one or more disks in an array are failed. This occurs, for example, at IPL time when an exposed RAID-5 array was abnormally powered off or reset while parity updates were in progress.
Logically Bad Blocks may also be created while rebuilding blocks of data for a failed/replaced disk protected by RAID-5, an unreadable block on another disk in the array is encountered. In this scenario both the data block to be rebuilt on the failed/replaced disk and the unreadable block on the otherwise operational disk are lost.
Logically Bad Blocks may also be created when utilizing an on-disk write cache. This cache provides a considerable performance gain for disk writes. However if the electronic system fails (power, OS crash, etc.), or if there is an uncorrectable memory error, there is a high probability there will be some data in the cache which was not written to the disk.
Managing Logically Bad Blocks have been solved by utilizing various methodologies described below:
One example of a storage device methodology to manage Logically Bad Blocks is to implement a table of Logically Bad Blocks (known as a Bad Block Table) for each logical or physical storage device and then search the table on each Read operation to determine if the any of the blocks to be read are Logically Bad. Software must ensure on each Read operation that the blocks being read are not Logically Bad. Additionally, software must also determine on each Write operation if any Logically Bad Blocks are to be removed. While this search can be quite quick when the number of Logically Bad Blocks is small, system performance may decrease as the number of Logically Bad Blocks increases. There is also the general complexity of maintaining the Bad Block Table non-volatilely.
Another example of a storage device methodology to manage Logically Bad Blocks is to use disk operations such as Read Long and Write Long to read data+ECC (Error Correction Codes) from the disk, corrupt the ECC kept by the disk, and rewrite the block of data+ECC back to disk so as to make the block readable. This approach has the disadvantages of counting on a disk to provide the Read Long and Write Long commands and having a storage adapter having the capability to understand of the type/amount of ECC being kept for each disk block. Another disadvantage is that this approach does not directly address the problem of a Logically Bad Block for a missing/failed disk in a RAID array (although some implementations may try to address this problem by corrupting the corresponding parity block).
Another example of a storage device methodology to manage Logically Bad Blocks is to place a unique pattern in the header of a data block which it guaranteed never to be written by the host. This unique pattern is known as the Logically Bad pattern. This has been the approach used by the Direct Attach Storage (DAS) adapters used by International Business Machine System i and System p products. A specific pattern is used in the 8-byte header of a data block that indicates the block is Logically Bad. While a host like the i5OS operating system may write the 8-byte header, the host will never use this particular pattern in normal operation. The header is always written as zeros for AIX and Linux hosts. Thus, the storage adapter has a unique indication it can place in any block to indicate it is Logically Bad. This approach requires a non standard disk block size (520 bytes or greater; 8-byte header+512-bytes data+other optional trailer bytes).