1. Field of the Invention
The present invention relates to a method and apparatus for transferring data between a computer and its peripheral apparatus in general and to a method and apparatus for detecting and correcting errors in data transfers between a floppy and/or hard disk drive and a computer in particular.
2. Description of Prior Art
Computer systems of all kinds require storage devices in which is stored computer programs, data and the like. The storage devices may comprise an assemblage of electronic and/or magnetic devices, e.g. flip-flops or latches in the form of registers, magnetic core memories, bubble memories, rotating magnetic drums, or a variety of other types of memory devices such as, for example, a floppy disk memory or a hard disk memory, also known as a Winchester.
Regardless of the type of memory used, signals, referred to generally as data signals irrespective of their informational content, e.g. instructions, numerics, etc., must be transferred to and from the memory device quickly, reliably and preferably with a minimum of computer time involvement.
A floppy disk memory comprises a floppy disk and a floppy disk drive. The floppy disk is a flexible member coated with a magnetic recording medium. The member is permanently housed in a jacket provided with holes for providing access to the recording medium. The floppy disk drive comprises a motor driven spindle, one or more magnetic head members and assdciated electronics mounted in a housing. In use, the disk is removably inserted in the housing. After the disk is inserted in the housing, it is engaged by the spindle which rotates the disk in its jacket. The head members are located in registration with one or more of the holes. Control signals move the head member(s) in the holes from track to track in proximity with the disk and data signals from a computer are recorded on or reproduced from the spinning disk for use by the computer.
A hard or Winchester disk memory is quite similar to a floppy disk memory in some respects and quite different from a floppy disk memory in other respects. It is similar to a floppy disk member in that it records and reproduces data signals on and from a rotating magnetic disk member. It is dissimilar from a floppy disk memory in that the memory typically comprises a plurality of stacked rigid magnetically coated disk members which are permanently housed in the disk drive. In use, magnetic head members, which are also located in the drive, are moved from track to track in proximity with the opposite surfaces of each disk for recording data signals on and reproducing signals from the disk.
The requirement that data transfers between a computer and a memory device be quick, reliable, and preferably involve a minimum of computer time applies equally to both floppy and hard disk memories.
To a certain extent, the above requirements are interrelated. However, for convenience, they may be discussed with respect to two functionally separate aspects of disk memories, namely, head control and data transfers.
Head control involves the apparatus and control signals necessary to move the head members from one track to another in response to addresses received from the computer. For example, the addresses received from the computer must be decoded and corresponding control signals generated to move the heads to the track selected. Movement of the heads to the track selected in turn requires that there be generated control signals corresponding to the location of the heads as they are moved. If, for any reason, a head is not moved to a selected track or is displaced from a selected track due to a malfunction in the apparatus, such as by an equipment failure or an erroneous control signal, a corresponding control signal must be generated to alert the user, abort the operation in progress and/or correct the failure.
Due to the fact that head control involves the movement of electro-mechanical devices, the speed at which track selection occurs and track selection errors are corrected is relatively slow compared to the speed of a computer. Thus, if the computer were required to be directly involved in head control beyond that amount of involvement which is absolutely essential, most of the available performance of the computer would be lost. To avoid such a loss, the majority of head control functions are handled in a separate assembly which eliminates from the computer functions the task of controlling head movement in the disk drive. This assembly is commonly called a disk controller.
Once a head has been moved to a selected track on a disk, data transfers to and from the track may take place. The data transfers to and from the track are serial in nature. If the data received from or to be sent to the computer is parallel, it must be converted. This task may be done in the disk controller.
Generally, disk drives use a particular data format for storing data on the disk. A properly programmed computer in the disk controller may be used to perform this task.
Among other data handling tasks which may be performed by the disk controller is data error detection and correction. As used in digital communications, a data error occurs when a logical bit is reversed. For example, a logical "1" becomes a logical "0" or vice versa. A data error in the data stored on a disk may occur as a result of a voltage or current surge in the equipment as the data is being stored. It may occur due to an extraneous magnetic field which affects the data after it is stored on the disk. Or it may occur as a result of normal wear and tear of the equipment.
In quality disk memories, e.g. Winchester disk drives, the probability of an error occurring for reasons other than normal wear and tear is one in 10.sup.10 bits transferred. Such errors are called soft errors. The probability of an error occurring as a result of normal wear and tear is 1 in 10.sup.12 bits transferred. The latter errors are called hard errors. Errors which occur due to a defect in the disk are mapped out, i.e. eliminated, by making the offending disk sector unavailable for the storage of data.
The task of a data error detection and correction method and apparatus is to detect an error in data stored in a storage medium such as a disk, from a computer or other data source, identify its location and correct it.
In general, data error detection and correction methods and apparatus require the generation of a check sum. As each segment, e.g. byte, of a data stream is transferred from the computer to the disk, a partial check sum, a number corresponding to the data being transferred, is generated using one of a wide variety of well known error detection and correction codes, e.g. Hamming code, Reed-Solomon code, etc. At the end of the data transfer, i.e. after all segments have been transferred, the resultant final check sum is added to the end of and stored with the data on the disk.
When, subsequently, the data and the stored check sum are read from the disk, another check sum is generated as each segment of the data stream is being read. When all of the data and the stored check sum have been read, the newly generated final check sum is compared with the stored check sum. A correspondence between the two indicates the data read is error free. A lack of correspondence indicates that the data read contains an error.
In the case of an error, the lack of correspondence described above results in a number which will identify the location of the error, provided the error detected is within the performance capability of the error detection and correction code being used. If it is, all that is required to correct the error is to invert the bits affected. Because the check sum created during reading contains information identifying an error, if one exists, it is descriptively called a syndrome.
The performance capability of a particular error detection and correction code is determined by the size of the burst in a bit stream of a given length that can be detected and corrected. A burst is defined as a series of contiguous bits and may be one bit long or as long as the entire data record plus the check sum stored therewith. For example, a code which can detect and correct a 10 bit burst in a 256 byte bit stream is usually more powerful than one which can detect only a 5 bit burst in a bit stream of the same length. Similarly, a code which can detect and correct multiple bursts in a bit stream of a given length is more powerful than one which can detect and correct only a single burst in a bit stream of the same length.
A burst may also be defined as a series of bits of a predetermined length in which one or more bit inversions are located. Of the known codes which are available for error detection and correction in general, certain ones are more easily implemented in hardware and software for use in particular cases. For example, for use in detecting and correcting errors in a computer system using a disk memory, the family of Reed-Solomon codes is a preferred code.
Data records are typically 128, 256 or 512 bytes long, or some multiple or submultiple thereof. Currently, data records on floppy and hard disks are typically 512 bytes long.
When an error detection and correction code, such as a Reed-Solomon code, is used for detecting and correcting errors in a data record, the resulting check sum which is stored on the disk at the end of the data record must also be checked for errors during the generation of the syndrome when the data record is read from the disk. Consequently, the check sum/syndrome must be large enough to identify the location of errors in both the data record and the check sum. For example, in a Reed-Solomon code, the symbol size is typically one byte long. Since one byte comprising 8 bits is capable of identifying only 2.sup.8 or 256 byte locations, it is necessary to use a multiple byte check sum/syndrome to detect and correct errors in a data stream comprising a 256 byte data record and its accompanying check sum.
In practice, the requirement to correct errors in data records larger than the natural addressing ability of the error correction code frequently leads to the use of interleaving. Interleaving comprises generating a check sum using certain of the bytes in a data stream and generating other check sum/syndromes using other ones of the bytes in the data stream. For example, if two interleaves are used, it is the practice to generate one check sum/syndrome using bytes 0, 2, 4, 6 . . . and to generate the other check sum/syndrome using bytes 1, 3, 5, 7 . . . . In this manner, for example, a 256 byte data record can be handled for purposes of error detection and correction as if it comprised two 128 byte records. Since a single byte syndrome is capable of identifying up to 256 error locations, a two byte syndrome is clearly sufficient for identifying error locations in a 256 byte data record and its accompanying check sum.
In addition to increasing the power of the code to detect and correct errors in larger data streams, the practice of interleaving segments also increases the power of a code to detect large sized single burst errors. For example, the use of interleaving permits the detection of bursts which extend over two or more bytes much more readily than can be done without interleaving. This fact is shown in practice and by a mathematical analysis of the performance of any given error detection and correction code with which interleaving may be used.
While conventional error detection and correction codes and their implementation and use are well known, one of the principal disadvantages of prior known Reed-Solomon error detection and correction methods and apparatus has been the fact that the generation of check sums and syndromes has required the use of separate apparatus. That is, one apparatus was used for generating the check sum and a separate apparatus was used for generating the syndrome.
Another disadvantage of prior known Reed-Solomon error detection and correction methods and apparatus has been the fact that there was no easy way to change the performance characteristics of apparatus to handle a more powerful code without extensively modifying the existing apparatus. For example, to change from a single burst correction capability to a double burst correction capability required that an entirely different apparatus had to be used.