1. Related Application
Patent application entitled "Method and Apparatus for Guaranteeing Average Case Disk Transfer Bandwidth and Seek Time For a Video Server", (P755), U.S. patent application Ser. No. 08/446,144 by Jim Hanko and Steve Kleiman, assigned to Sun Microsystems, Inc., Mountain View, Calif., assignee of the present application is incorporated by reference herein. The above identified patent application is incorporated for the purpose of setting a stage for a discussion of the present invention and hence is not to be considered prior art nor an admission of prior art.
2. Field of the Invention
The present invention relates to the field of data storage and retrieval. More particularly, the invention relates to error detection and correction for mass storage systems.
3. Description of the Related Art
Conventional mass storage systems, e.g., banks of disk drives, have individual disk drive capacities approaching 2.0 giga-bytes of storage space and mean time between failures (MBTF) approaching thirty years. Unfortunately, the space demands of many consumer applications e.g, video servers, have also increased at a rapid rate and require an increasing number of disk drives per storage system. For example, even with data compression technology such as MPEG, a typical 120 minute full-length movie requires 3.6 giga-bytes of storage space. As a result, a video server with 15 full length movies will require a bank of thirty 2-giga-byte disk drives or its equivalent.
Statistically, a bank of thirty disk drives, assuming individual drive MBTF of thirty years, will have a system MTBF of about one year, i.e., one or more drives of the bank can be expected to fail after merely one year of continuous operation. As such, error detection/correction schemes have been developed to combat and compensate for reduced system MBTF due to the large number of disk drives in the bank. One such conventional scheme is the redundant array (of) inexpensive disks ("RAID") system.
FIG. 1 is a block diagram of a "pseudo RAID-4" video server 100 in which data is spread across (striped) data disk drives 111, 112, . . . 119, with a parity disk drive 120 for storing error correcting information. As is well known in the art, a conventional high capacity disk drive, e.g, drive 111 has multiple platters 111a, 111b, 111c, . . . 111z and a corresponding number of read/write heads (not shown). These heads are operatively coupled so that they move together as part of a head assembly (not shown).
Historically, RAID was developed for "non-time-critical" user applications, e.g, as a network file server, where a second or two of time delay between a file request and the completion of a file retrieval is not critical to the application. Accordingly, data is distributed among cylinders of the disk drive in order to minimize movement of the disk drive's head assembly. A cylinder of a drive includes tracks of the platters accessible to the read/write heads of a head assembly without having to relocate the assembly. One such cylinder of disk drive 111 is represented by tracks 111a4, 111b4, 111c4, . . . 111z4 of platters 111a, 111b, 111c, . . . 111z, respectively. Data organization by cylinders maximizing the amount of data that can be retrieved before having to relocate the head assembly of a disk drive.
Naturally, in a typical RAID bank, parity is computed for corresponding cylinders of the bank of disk drives, i.e., data is stored in similar physical address locations (cylinders) of data disk drives while parity information is stored in the parity drive at a similar physical address location (cylinder). In this example, the data cylinder which includes tracks 111a4, 111b4, 111c4, . . . 111z4, the data cylinder which includes tracks 112a4, 112b4, 112c4, . . . 112z4, and the data cylinder which includes tracks 119a4, 119b4, 119c4, . . . 119z4, correspond to the parity cylinder which includes tracks 120a4, 120b4, 120c4, . . . 120z4.
Referring again to FIG. 1, RAID-based video server 100 is adapted for a conventional simplistic cable-TV system where each movie can be broadcasted by a network broadcaster 130 to multiple viewers 141, 142, . . . 149, at a preset start and end time in accordance with a schedule determined in advance by the cable operator. For efficiency, each movie is stored in large blocks of contiguous data, with a single copy of each movie being shared among viewers 141, 142, . . . 149. As a result, an individual viewer cannot select start and end times, and is forced to reschedule his/her other activities in order to be able to watch a movie.
However as viewers become more sophisticated and demanding, a viewer-friendly video-on-demand type system will become the norm. A video-on-demand system permits a video service subscriber, i.e., viewer, to select a movie at a viewer's own convenient start time. Unfortunately, attempting to adapt a RAID-based bank of disk drives for a "time-critical" application such as a video-on-demand server results in an inefficient system for the following reasons.
A video server based in the conventional RAID storage scheme tends to be inflexible with respect to start-times and selection of movies since the RAID optimization is based on storing large data blocks in cylinders of contiguous space. Further, since a minimal number of copies of each movie is stored (typically, one copy of each movie), only a limited number of viewers are able to simultaneous access the same movie.
Another disadvantage of the traditional RAID scheme is that multiple blocks of data are stored on the same cylinder of the disk drive. In the case of the innermost cylinder with the slowest access time, the viewer capacity of the video server is diminished during normal operation when there is no disk failure because viewer capacity is constrained by the worst case zone access time of the data drives.
Conversely, if data of a RAID-based server was distributed among cylinders of different zones and a disk failure occurs, a substantial amount of unnecessary data has to be read before data reconstruction can occur. This is because in addition to the normal read of the "good" drives, a read of large (logically unrelated but physically related) data blocks stored in the corresponding cylinders of the disk drives is also required. As a result, the server has to manage an excessive number of large data blocks which require large memory buffers for the data reconstruction. Consequently, when a disk failure occurs, either some viewers are dropped in order to service other viewers, or the number of viewers than can be serviced during normal operation is limited to the maximum number of viewers than can be serviced when there is a disk failure.
Hence, adapting the RAID error correction scheme to a video-on-demand system results in excessive reservation of processor and/or memory buffer capacity for real-time reconstruction of erroneous data. This is due primarily to the use of large inefficient data blocks instead of small efficient data slices ideal for the video-on-demand type system. This need for reserving processor and/or memory buffer capacity for error recovery becomes more acute as the number of movie selections and/or viewers increase.
Hence, there is a need for an error correction scheme for handling the small data slices optimized for a video-on-demand type server system, and capable of delivering "time-critical" data and real-time reconstruction of erroneous data without the need to reserve a substantial amount of processor capability or substantially increasing memory buffer requirements. Such a flexible video-on-demand server should have data distributed in small efficient slices of contiguous video information, striped across the bank of drives, so as to multiplex access to the same movie by interleaving access to the small data slices. Such a system should provide an illusion of simultaneous access to the same movie by the large number of viewers at each individual viewers' preferred start times.