1.1. Field of the Invention
The present invention relates to the field of electronic storage systems management.
1.2. Description and Disadvantages of Prior Art
FIG. 1 illustrates the most basic structural components of a prior art hardware and software environment used for a prior art file recovery method, and describes a system 100 according to prior art.
Referring to FIG. 1, some applications 101, denoted as APPL_1, . . . APPL_N running on a host computer 104 typically store data in a file system 102. A host computer system 104 is connected to a storage system 106 via a storage network 108 such as a Storage Area Network (SAN). The host computer 104 is connected to the storage system 106 through an additional management network 109 which might be based on Ethernet and TCPIP. The applications 101 are business applications reading and writing files to a file system 102. Respective files 110 are stored in a storage system 106.
A file 110 is stored on a single disk or an array of disks (RAID) 112 pertaining to the storage system 106. The file thereby occupies one or more logical block addresses (LBA) 114, 115, 116 which may reside on one or multiple disks 112.
Storage subsystems 106 according to prior art incorporate methods and processes 118 to detect read errors prematurely i.e., before an application attempts to read that data. One of these methods is referred to as “data scrubbing” which expression is used throughout this disclosure representative also for similar processes of prior art which check for storage related errors. The “data scrubbing” process 118 periodically reads the data and checks for read errors.
The data scrubbing method is implemented in disk systems (such as IBM DS4000 and IBM DS8000) as a background process. This process is transparent to the applications and periodically reads data addressed by logical block addresses (LBA) within a storage system. The data might be stored on a single disk or on a RAID array (Redundant Array of Independent/Inexpensive Disks). The purpose of this process is to identify data addressed by logical blocks which is erroneous; i.e., which shows errors during the read process, and if possible perform corrective actions (such as copying records to a different portion of the storage media).
Data scrubbing implemented in storage systems works block oriented; i.e., the data stored in logical blocks is read in order to verify whether it is readable. Some implementations check the ECC (Error-correcting code) pertaining to the data block in order to verify the data is authentic and correct the data if required.
Prior art Data scrubbing implemented in storage systems identifies erroneous data. It typically implements certain error classes and recovery actions. For critical errors where the data is not readable without correction, the storage system automatically relocates the data to spare block addresses. Recovery mechanisms for bad blocks might include reading erroneous block multiple times and reconstructing unreadable blocks from redundant information which for instance is maintained in RAID levels such as RAID 1, RAID 3, RAID 5, RAID 6 and RAID 10. The original block address is thereby replaced by the spare block address used for relocation. Thus, subsequent operations to the original block address are referred to the spare block address. For non-critical errors the data scrubbing process within the storage system keeps a list of block addresses which may become bad.
Typically, a file system 102 reads and writes data to a disk subsystem. Thereby the data is usually arranged in files 110 which are written and read by applications 101. Each file uses one or more logical block addresses (LBA) 114, 115 and 116 on the storage system 106. Respective typical applications are file systems, volume managers and data bases.
One problem with the prior art described above is that for critical errors only the failing block is relocated to an error-free storage location. Thereby the data scrubbing process does not have knowledge which other logical blocks are associated with the failed block. Thus, there may be other logical blocks (LBAs) related to one and the same file which may become bad.
Another problem of prior art is that for non-critical errors the application or the file system does not get the information that data blocks may eventually become bad. For non-critical errors which the data scrubbing process detects, it may not even move the erroneous LBA to a new LBA but correct the problem otherwise; e.g., through other redundancy like RAID. Non-critical errors can also be of such nature that it is recoverable with no need to move an LBA.
In extreme cases the data scrubbing process might detect an error which is not recoverable and thus can not be relocated. This for example can happen when data from an LBA can not be read at all due to a hardware failure. According to prior art manual intervention is disadvantageously required to restore the file from a copy such as a backup on a backup medium or a replica of a file system.