1. Field of the Disclosure
The present disclosure relates in general to the field of data storage systems and, more particularly, to a system and method for repairing, in an automated fashion, the media of the storage system after an error is encountered in the media.
2. Background of the Related Art
As the value and the use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores and/or communicates information or data for business, personal or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems, e.g., computer, personal computer workstation, portable computer, computer server, print server, network router, network hub, network switch, storage area network disk array, redundant array of independent disks (“RAID”) system and telecommunications switch.
Computer systems often include hard media, such as IDE and/or SCSI devices. Hard media errors during read operations on SCSI drives under RAID controllers are gracefully handled for redundant RAID configurations (such as in RAID levels 1, 5, or 10) but not on non-redundant configurations (such as RAID level 0, or degraded levels 1, 5, or 10) where there is no recovery mechanism. The host level software application may experience a read failure when a media error is encountered because the data associated with software application is stored at the location of the media error and is thus inaccessible and/or corrupted.
One problem scenario is when a user attempts to restore data from a backup. Part of the restored data may again be written to the same (bad) sector that caused the read error originally. SCSI drives do not track sectors that have caused read errors previously, and new write commands to the bad sector may be completed without any verification and thus reported as being completed successfully. Subsequent read commands from that bad sector may result in an unrecoverable error due to lack of data availability or corruption.
A second problem scenario is when a user performs a “verify” operation on the SCSI disk. In that case, the verify operation would detect the bad sector on the disk and reassign a good sector (from the spare sectors) in place of the bad sector. The problem with this operation is that unknown “data” (in the form of “1's and 0's”) exists on the newly assigned good sector. The software application that was using the data on the bad sector is unaware of the reassignment by the verify operation, and hence does not know that a block of data (from the bad sector) is now of unknown status or validity. Indeed, the software application could issue a read request for the data in the reassigned sector and inadvertently read the unknown data that was present in the new sector when it was reassigned during the verify operation. The software application would then be working on unknown, and potentially corrupted data, which may result in a crash of the software application, or produce inaccurate results. A user may restore the damaged file after the repair, but the verify operation may have reassigned/repaired other bad sectors that were discovered during the verify operation and the files residing on those sectors would (presumably) be corrupted. Moreover, the files in question may have already been corrupted (due to a bad sector) but went unnoticed because those sectors had not undergone a read operation.
In the past, recovery from media errors on SCSI drives required a complete restore operation from backup (assuming that a backup existed). A complete recovery was warranted because it was hard to determine which files were corrupted and/or damaged due to bad sectors that were uncovered during the verify operation. There is, therefor, a need in the art for a system and/or method for avoiding bad sectors on a storage media while maintaining operation of that media.