1. Field of the Invention
This invention relates generally to computer storage devices, and more particularly to methods, circuitry and software for managing defects in computer storage media, such as computer disk drive media.
2. Description of the Related Art
Disk drive media virtually always contains defective areas, where it is impossible to reliably record and read back data. Rather than take a manufacturing yield loss, systems have been created to avoid the recording of data on those defective areas. Typically, a given size media defect becomes more apparent and affects more data as data is required to be recorded more densely. Therefore, defect management schemes have been an essential facilitating component for the increase in disk drive capacity that has been occurring for many years.
Over the years, this function of avoiding defective areas has migrated from the host computer and operating system to the disk drive itself. This migration has been a logical extension of the integration of more functions on the disk drive circuitry. As a result, operating systems no longer have intimate information of such items as the geometry of the drive, the block ordering, skewing, or the defect management techniques employed. Removing this intelligence from the operating system has allowed disk drive vendors to optimize these items for the specific product. Thus, these items are now commonly implemented by a combination of firmware and hardware on the disk drive.
For background understanding, the physical address of a given block of data on a disk drive is composed of three components: cylinder, head, and sector. The cylinder represents the radius of the data, the head represents the disk surface of the data, and the sector represents the rotational position of the data. Collectively, this type of address is often referred to as a xe2x80x9cCHSxe2x80x9d address. The combination of cylinder and head is often referred to as a xe2x80x9ctrackxe2x80x9d address. When the host computer desires to transfer one or more data blocks, it specifies the address of the first block and a block count parameter to the disk drive. Most interfaces allow the data address to take the form of a CHS address, however, this is primarily for historical reasons. A CHS address passed by the host computer implies that the computer has knowledge of the geometry of the drive, the layout of the blocks (e.g., after transferring track X, it will know which track is next), the skewing, and the locations of the defects. As mentioned above, because these functions are now commonly embedded in the disk drive, the disk drive firmware translates the passed CHS address into an actual CHS address before accessing the data. For clarity, the host side CHS address is generally referred to as a xe2x80x9clogical CHSxe2x80x9d address, and the actual CHS address is referred to as a xe2x80x9cphysical CHSxe2x80x9d address.
Modem disk drive interfaces allow the host to specify a logical block address (LBA). Some interfaces allow either LBA or logical CHS addressing, others allow LBA addressing only. An LBA is simply a block number that allows the host computer to access the data without any real or perceived information on the geometry, skew, layout, or of the location of media defects. Often, a logical block address is a byproduct of the logical CHS address conversion, and therefore it is generally more efficient to simply pass a logical block address from the host computer to the disk drive, if the interface allows either. However, most modem disk drives implement defect management during the address translation process, regardless of whether the address specified by the host computer was a logical CHS or LBA address.
Two major techniques are commonly used to avoid media defects: block slipping and block relocation. It should be noted here that other techniques have also been used over the years, however, these two techniques in various forms are used exclusively in virtually all modem disk drives. Furthermore, virtually most all modem disk drives commonly employ both techniques.
Block slipping is simply the avoidance of defective blocks by jumping over the defective block in the address translation process. Consider the following LBA to physical sector relationship shown in table A.
Physical sector 8 is reserved for use if a defective location is discovered, and is not normally accessible by the host computer. If physical sector 4 is determined to be defective, the address translation process can use this information to change the LBA to physical sector relationship and avoid the defect as shown in table B.
In this example, physical sector 4 is not addressable for host transfers. The address translation process avoids the defective sector while maintaining the correlation between the logical block address and the sequential nature of the data being stored. Therefore, sector 8 will now be addressable as LBA 7. The major advantage to block slipping is that it incurs minimal performance loss on multiple block transfers. In the above example, if all 8 logical blocks were being transferred, the actual transfer time is increased by only 12.5% due to the effect of the defect. However, the actual percentage of the time penalty associated with the defect is generally far less than this, because total access time also includes seek and rotational latency delays, which are unaffected by the defect.
Block slipping does have a major disadvantage. If a physical sector is determined to be defective after customer data has been placed on the drive, slipping the block requires user data to be moved. How much data must be moved is a function of the location of the defect versus the location of the pool. However, as is well known, moving user data is very problematic. For example, power loss during the process could result in corrupted data. Also, moving data can be very time consuming, which can cause host timeout issues. Therefore, because the movement of user data is risky and slow, block slipping is generally used only for defects found in the factory, or if used for defects found in the field, reserved sector pools are placed frequently enough (every track or every cylinder or some other limited range) such that the risk and time associated with the data movement are minimized.
The other defect management technique mentioned above is block relocation. Block relocation is the avoidance of defective blocks by jumping instead to another, out of sequence block, and then returning to resume the transfer. Reference is again made to the LBA to sector relationship of table A, where the physical sector 8 is reserved for use if a defective location is discovered, and is not normally accessible by the host computer. If physical sector 4 is determined to be defective, the address translation process can use this information to change the LBA to physical sector relationship and avoid the defect as shown in table C.
In this example, the address translation process avoids the defective sector, but does not maintain the correlation between the logical block address and the sequential nature of the data being stored. Accordingly, LBA 4 is out of order. Block relocation therefore relies on one or more pools of unused sectors, into which the logical block address accesses can be redirected as needed to avoid defective sectors. Disk drive vendors commonly place these pools at the end of each track, each cylinder, each zone, or one large pool at the end of the volume. Often, these pools are also common with block slipping pools. For example, consider the case where one sector per track is reserved. If a track has two defects, one of the defects may be slipped, thus consuming the spare sector, and the other defect may be relocated to a nearby track that has no defects.
Another common implementation that shares one reserved pool for slipped and relocated blocks consumes the pool from the earliest address upward for slipped blocks and from the highest address downward for relocated blocks. The design must always consider the effect of defective sectors within the reserved pools, however, and these approaches have the potential of further complicating the defect management process. The major advantage of block relocation is that newly defective sectors can be relocated with minimal movement of user data and the associated problems of block slipping. The major disadvantage of block relocation is that it has a negative impact on sequential performance, because the physical order of the blocks on the disk no longer correlate to the addressing order. Accordingly, to transfer a relocated block which is a part of a multiple sector transfer often requires two extra seek operations and introduces rotational latency delays.
Regardless of the specific techniques employed, defect management is fundamentally an address translation function. A user address is translated into a physical address and in the process, the location of defective sectors is factored in for the purpose of avoiding the defects. Thus, the location of the defective sectors is kept in RAM, where the information is readily accessible during address calculations. A non-volatile storage technique is also employed, so that defect location information is not lost through power cycles. However, because disk drive vendors are under enormous pressure to reduce costs, solid state non-volatile storage of the data is typically not practical. Therefore, a vast majority of defect management implementations store the defect location information on the disk itself.
There are many variations on the storage and retrieval of the defect information. For example, some vendors choose to store one large defect list on the disk, and retrieve it on the initial spin-up. Others periodically retrieve much smaller quantities of defect location information. The specific defect management design decisions made are often interrelated to the defect location storage and retrieval strategy. One very common problem, however, is that defects can exist in the area in which the defect information is stored. Therefore, disk drive vendors often write redundant copies of the defect information to ensure the information can be successfully retrieved.
As is well known, the number of defects on a modem disk drive can be quite large. For example, a 10 gigabyte disk drive with the traditional allowance of 1 defect per megabyte can potentially have 10,000 defects. However, this traditional defect allowance is changing, due to today""s downward price pressures on disk drives. As such, disk drive vendors are constantly seeking to lower their manufacturing costs, and one way to do so is by spending less time finding defects. To ensure effective defect finding in less time, manufacturers often marginalize various electrical or track following parameters that affect error rate and label every error encountered as being caused by a media defect. As a result, the traditional allowance of 1 defect per megabyte is being displaced by numbers more like 5 defects per megabyte. Therefore, the same 10 gigabyte drive that may have 10,000 defects with the traditional allowance may now need to allow 50,000 defects. Generally, 4 bytes or more are needed to fully describe the location of a given defect. So the total amount of defect sector location information can easily exceed 200K bytes on a modem disk drive.
There are many possible ways to structure this defect location information. Because there can be so much data, the designer must consider the speed at which the relevant data can be found for a given address calculation. One very common technique is to order the information based on the sequential access order of the blocks. This has the advantage of requiring only a single potentially long look-up time at the front of a new read or write operation, followed by sequential access of the data as the disk transfer progresses. There are many other methods of achieving fast access, for example, hashing algorithms, but these always incur a size penalty which is very critical, and therefore ordered defect lists are the most common technique.
Defects can be discovered on either read or write operations. On a read operation, typically a defect is assumed to be found when the drive firmware detects that a sector has become difficult to read. Other techniques may be employed to ensure that the difficulty is indeed being caused by a media defect, for example, the sector may be rewritten and then the read operation attempted again. Write operations are different in that a media defect located within the data sector cannot be detected while actually writing the data. However, a defect in a nearby servo field will be detected and indicate that the write may not be have been successful. As is well known, servo fields are fields on the media placed periodically to provide feedback as to the position of the actuator, and the success of a write operation is typically a dependent on correct actuator position.
For example, reference is now drawn to Table D below, which shows 5 data sectors and 4 servo burst fields. The servo fields provide the feedback to ensure that write operations are on track and at the correct rotational position. Since the actuator can be moving off track during the write operation, the typical requirement for a write operation to be considered good is that all servo fields around the sector indicate correct position. For example, a successful write of: (a) sectors 0 or 1 requires correct position feedback from servo bursts 0 and 1; (b) sector 2 requires correct position feedback from servo bursts 0, 1, and 2; (c) sector 3 requires correct position feedback from servo bursts 1 and 2; and (d) sector 4 requires correct position feedback from servo bursts 1, 2, and 3. Therefore, a single media defect in a servo field can affect multiple sectors.
Disk drive test processes typically attempt to locate all media defects and add them to the defect list(s) so that the locations will be avoided when the disk drive is in use by the customer. However, many times new media defects become apparent in the user environment anyway. Some of the factors that may contribute to this are, for example, undesirable head/media contact, an incomplete defect scan during the factory test process, or contamination, the latter being of particular concern on removable media type products.
Some disk drives are designed to automatically detect these new defects during normal user operation and automatically begin avoiding these locations. Typically, the avoidance of these locations is done by the relocating process as described earlier, although some drives will implement defect slipping and move customer data as necessary. However, automatic reallocation processes can be very time consuming, and this has a negative impact on the performance of a disk drive. Furthermore, many times host computers place timeouts on disk operations, and time consuming automatic reallocation processes can unfortunately cause host timeout conditions.
This problem is, however, particularly undesirable on write operations, due to a common performance enhancement technique known as xe2x80x9cwrite caching.xe2x80x9d Write caching refers to a process by which the disk drive notifies the host of write operation completion as soon as all of the data has been accepted from the host computer, but before the data has actually been written to disk. Modern disk drives that support write caching can typically accept many write commands, potentially before the first one is even complete. The timeout condition frequently occurs when the host sends one or more write commands, has been notified that they are complete, and then issues another command that relies on the completion of the write operations, for example a read operation, or a spin-down operation. Since the disk drive must complete the write operation before responding, it is critical that the actual write operation(s) be completed in a timely manner, so that a host timeout on the new command does not occur. As a result, automatic reallocation operations during those write operations can greatly slow their completion and cause a timeout.
Some of the factors that contribute to the extra time required to perform a write operation automatic reallocation include the following. The first factor is performing retries on the blocks. Disk drive systems will typically perform several retries on a block before determining that the write operation is unsuccessful due to a media defect. This is because several things other than media defects can cause difficulty in writing, such as shock or vibration. The second factor is the amount of time required to insert the new defect locations into the defect list. As previously described, disk drive defect lists can be quite large, and often it is advantageous to order the list. So the insertion of a new location can often require moving all of the data beyond the insertion point. Even using other techniques, such as hashing algorithms or linked lists, insertion time can become an issue. The third factor is the amount of time required to move customer data. If the reallocation process implements a relocation only, this involves seeking to the relocation destination address and properly placing the data. If the reallocation process implements a block slip operation, this will involve moving all customer that is affected by the slip, which can be quite large. The fourth factor is the amount of time required to write the new defect location information onto the disk. After determining that a new defect exists and inserting the location information into RAM, the system must write this information to the disk, and typically it must write redundant copies. This can be very time consuming. The fifth factor is the propagation of servo field defects to multiple data sectors. After exhausting the retry count and reallocating a sector, the write operation will attempt to continue. However, many times the next sector will also be unwritable due to the manner in which a single servo field defect affects multiple data sectors. In some systems, this may require completely repeating the operations discussed with reference to factors one through four above.
It should be noted that many vendors choose to defer performing the operation described with respect to the fourth factor until after all write operations are complete. This is typically done to reduce the write reallocation time, in the case where multiple new defects are encountered. However, this adds risk, because there is more time between the reallocation process and the saving of the information on disk, which gives more opportunity for the information to be lost due to a power cycle. This can cause loss of user data because there is no record of the defect reallocation and therefore the user data cannot be recovered.
FIG. 1A shows a simplified top view of a partial magnetic disk 10, illustrating the track and sector format for a typical recording surface 13. The recording surface 13 therefore includes a plurality of concentric tracks 12a through 12n, which extend from about the center spindle 15 out to the periphery of the magnetic disk 10. As shown, each of the tracks include a plurality of sectors S1 through SN. The recording surface 13 is also typically divided by a plurality of servo fields 14, 16, 18, 20, etc., which extend from the center spindle 15 out to the periphery of the magnetic disk 10. For purposes of illustration, the servo fields 14, 16, and 18 of FIG. 1A are shown in FIG. 1B as servo fields SF0, SF1, and SF2, respectively. Each of the sectors, arranged in each of the particular tracks, are typically of fixed length, such as the exemplary 512 bytes. Because the sectors are typically of a fixed length, the servo fields will commonly separate the data of a particular sector, generating what is known as a split data sector. As illustrated in FIG. 1B, sectors 26 and 32 are each divided into two sector segments by the servo fields 16 and 18, respectively.
As mentioned above, the servo fields assist a magnetic disk device in determining when off-track errors or other types of errors occur during the writing of data to the magnetic disk media 13. To illustrate how data writing errors are handled when the servo fields indicate that an off-track error has occurred, FIG. 1C pictorially illustrates the prior art reallocation scheme that is most commonly implemented. Assuming that an exemplary data transfer that is intended to be written to a track of the magnetic disk 10 includes data sectors 1 through 9, the writing of data to the track 12b would begin after the servo field zero (SF0) 14.
In this example, also assume that the magnetic disk drive has determined that the writing of data up to servo field zero (SF0) was ON track as pictorially shown in FIG. 1C. Accordingly, a write operation 50 is commenced with the data of sector 1 and then sequentially written up to a point 50xe2x80x2 when an off-track error is detected by the examination of the servo field 1 (SF1) 16. As illustrated, the detection of the off-track error that occurred by examining the data of servo field 16 happened sometime shortly after the reading and examination of the servo field 16. For this example, it will be assumed that the off-track error is detected somewhere during the time when sector 6 is being written to the track 12b. When the off-track error is detected at point 50xe2x80x2, the writing operation 50 will cease and then will be commenced anew by retrying the write back at the start of sector 1. This, of course, will not occur until the disk has spun around once again in order to allow correct positioning for commencing the writing of sector 1 and its consecutive sectors that are part of the transfer.
Once again, an off-track error will be detected at point 50xe2x80x2 where the writing will again stop and commence with sector 1. The write operation 50 is therefore retried a number of times (e.g., between 30-100 times) until a counter expires indicating that the error is actually an off-track error and the writing of sector 1 would cause an improperly written sector, which could be misaligned along the track 12b. Once the counter has expired on the number of retires for the write operation 50, sector 1 will be reallocated.
The reallocation of sector 1, will generally require that the address of sector 1 be inserted into the defect list and the destination address of that sector data also be properly identified in the defect list. As is well known, identifying where in the defect list the sector will actually be written depends upon the sector address. Because the defect list will generally already include many defect addresses, it will be necessary to identify the proper location where the defect address for sector 1 will be inserted.
After the location where the address for sector 1 is to be inserted is identified, the addresses below that address must be shifted to enable the proper ordered location in the defect list. Once the defect address has been inserted into defect list, the new defect list is written to the disk in a particular location, such as a reserved pool. Once the new defect list has been written to disk media, the sector 1 data is reallocated to a reserved pool location which is identified in the defect list address.
After sector 1 has been reallocated, the prior art technique reverts back to attempt writing of the data beginning with sector 2 on by performing a write operation 52. The write operation 52 will also stop at a point 52xe2x80x2 after the off-track error is detected at some point after the servo field 16 has been read and analyzed by the disk hardware/software. At this point, the write operation 52 will again be retried beginning at sector 2.
This will thus continue until the retry counter expires being indicative that the servo field 16 does in fact, have some type of off-track error. At that point, sector 2 will be reallocated in the same manner as sector 1 was reallocated. Once sector 2 has been reallocated, the write operation will then begin again to try to write the data beginning with sector 3 on. The write operation 54 will therefore commence with sector 3 and then stop at a point 54xe2x80x2 some point after the servo field 16. At point 54xe2x80x2, write operation 54 will stop and then will be retried beginning again with sector 3. The retrying operations will therefore continue until a counter has expired. Once the counter has expired, the servo field 16 off-track error will be verified and sector 3 will be reallocated in the same manner that sector 1 and sector 2 were reallocated.
Still further, the prior art method will go back to try to write the data beginning with sector 4 during a write operation 56. The write operation 56 will therefore continue up to a point 56xe2x80x2. At a point 56xe2x80x2, it is determined that the servo field 16 has some type of error. Due to this error, a retry operation will commence and the write operation 56 will continue again until a counter expires. Once the counter expires for the write operation of 56, the sector 4 will be reallocated in the same manner that sector 1, sector 2, and sector 3 were reallocated. Once sector 4 is reallocated, the method will again attempt a write operation 58 which commences with sector 5. This write operation will therefore continue up to a point 58xe2x80x2 where it is determined that a servo error has occurred due to servo field 16. At that point, the write operation 58 will stop and a retry will be performed back beginning with sector 5.
The retries will therefore continue until a counter expires being indicative that the servo field does, in fact, have an off-track error. At this point, sector 5 will be reallocated in the same manner that sector 1, sector 2, sector 3 and sector 4 were reallocated. Due to the servo field off-track error detected with servo field 16, sectors 6, 7, 8 and 9, which form part of the exemplary data transfer, will also be reallocated. As can be appreciated, when a single servo error occurs due to the detection of an off-track error, the piecemeal and continual reallocation of sectors xe2x80x9cone-by-one,xe2x80x9d is a very laborious process which has the downside of significantly impacting the performance of a disk drive system, whether the media is magnetic or optical, and whether the media is fixed or removable.
In view of the foregoing, there is a need for disk drive systems which will enable more efficient handling of servo field errors due to detected errors. A need also exists for a technique that will eliminate the continual piecemeal reallocation of sector data when a single servo field error is detected. The need is therefore clear for a technique that will improve a write operation implementing a reallocation process, such that less time is required to complete the writing operations, without introducing a risk of data loss.
Broadly speaking, the present invention fills these needs by providing a method, system, and hardware/software implementation to enable the automatic reallocation of multiple sectors at once when a new defect is discovered during a write operation performed to a storage media. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium. Several inventive embodiments of the present invention are described below.
In one embodiment, a storage media defect management method for use in a storage device that is to be coupled to a computer system is disclosed. The method includes writing data to a storage media of the storage device, the storage media having a plurality of servo fields. Stopping the writing upon detecting an off-track or other servo field error with a particular servo field of the plurality of servo fields. Reallocating a plurality of suspect sectors that may be affected by the off-track or other servo field error detected in the particular servo field. The method then proceeds to resume the writing beginning with a next sector following the plurality of suspect sectors that were reallocated. In a preferred embodiment, a number of preliminary checks are preferably performed on the plurality of suspect sectors to ascertain which ones may not be part of the reallocation transfer. However, the plurality of suspect sectors that are reallocated will preferably not be more than a maximum look-ahead number, which in this embodiment, is determined by ascertaining what zone in of the storage media the current track is being written. In this manner, maximum look-ahead reallocation number can be custom set depending on which track in the media contains the servo error.
In another embodiment, a sector reallocation system for performing defect management of data being written from a host computer to a storage media is disclosed. The system includes transferring data to be written on the storage media from the host to a data buffer of a storage device. Then, the system writes data from the data buffer to a track on the storage media. The storage media having a plurality of servo fields. Upon detecting an error in a recent servo field of the plurality of servo fields, the system discontinues the writing to the track on the storage media, and retries writing until a retry counter expires due to a continued detection of the error in the recent servo field. The system includes setting a maximum look-ahead reallocation that defines a maximum number of sectors that can be affected by the error detected in the recent servo field of the storage media. The system is then configured to perform a number of checks to ascertain a number of planned sectors of the maximum number of sectors that are to be reallocated. In this manner, the number of planned sectors will be less than or equal to the maximum number of sectors (i.e., the max look-ahead reallocation number). After the checks are performed, the system reallocates the number of planned sectors, and resumes the writing of the data from the data buffer to the track on the storage media beginning with a next sector after the reallocated number of planned sectors.
In yet another embodiment, a computer readable media containing program instructions for performing sector reallocation defect management is disclosed. The computer readable media includes: (a) program instructions for initiating a writing of data to a storage media having a plurality of servo fields; (b) program instructions for causing a discontinuation of the writing upon detecting an off-track or other servo field error with a particular servo field of the plurality of servo fields; (c) program instructions for detecting when a re-try counter has expired; (d) program instructions for causing a batch reallocation of a plurality of suspect sectors that may be affected by the off-track or other servo field error detected in the particular servo field; and (e) program instructions for causing a continuation of the writing beginning with a next sector following the plurality of suspect sectors that were reallocated.
In still another embodiment, a computer controlled reallocation defect management method implemented on hardware and directed by program instructions is disclosed. The method includes: (a) initiating a writing of data to a storage media having a plurality of servo fields; (b) causing a discontinuation of the writing upon detecting an off-track or other servo field error with a particular servo field of the plurality of servo fields; (c) detecting when a re-try counter has expired; (d) causing a batch reallocation of a plurality of suspect sectors that may be affected by the off-track or other servo field error detected in the particular servo field; and (e) causing a continuation of the writing beginning with a next sector following the plurality of suspect sectors that were reallocated.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.