The demand for networked digital audiovisual systems is expected to grow considerably over the next few years as businesses, government and other institutions increasingly turn to digital networks to distribute audiovisual information for education, presentations and reference applications. These customers expect systems that will allow a number of people to be able to view audiovisual information from a server simultaneously, while fully retaining their other network functions.
The characteristics of files, file access and network traffic in digital video applications differ substantially from those encountered in data applications. With data applications, whenever a user makes a file access request to a server, or requests that data be transmitted on a network, the user expects a fast response--fast compared to the time it takes it to place the next request. As a result, the capacity of a server and the overall bandwidth must both be large compared to the average demand placed by a single user. Accordingly, the design of a file server aimed at supporting data applications and the design of a network to support data traffic have been based on the principle of bandwidth sharing and statistical time multiplexing. File servers have furthermore taken advantage of the property of locality in file access, and incorporated appropriate caching mechanisms. In all cases, as the overall load on the shared resources increased, the average response time experienced by all users also increased.
Consider now digital video. A video signal is analog in nature and continuous over time. It is digitized by first sampling it at regular intervals, and then by quantizing each sample. This digitization process results in a data stream which is of relatively constant and very high rate (NTSC signals result in data rates in the neighborhood of 100 Mb/s and an HDTV signal, 600 Mb/s.) However, given that the sampled data exhibits a great deal of redundancy, compression is applied, thus significantly reducing the stream's rate. Depending on the bandwidth of the original analog signal, the sampling rate, the quantization step size, the encoding method, and the desired image quality, the resulting data rate for a digital video signal ranges from 64 Kb/s to tens of Mb/s. For example, CCITT Recommendation H.261 specifies video coding and decoding methods for audio visual services at the rate of px64 Kb/s, where p is in the range of 1 to 30 (i.e., 64 Kb/s to 2 Mb/s). The MPEG standard specifies a coded representation that can be used for compressing video sequences to bit rates around 1.5 Mb/s. Advances in compression techniques and their VLSI implementations are among the important reasons why video services over LANs and WANs are becoming practical.
Two important observations can be made. The first is that the volume of bits corresponding to a digitized video segment of useful duration (even compressed) is large. A ten minute MPEG video segment requires over 100 Mbytes of storage; ten hours requires over 5 Gbytes. Thus video servers where such video information is to be stored must have relatively large storage capacity.
The second observation is that the communication of digital video data between two nodes on a local area network (e.g., a server and a desktop station) requires that data be transmitted in a stream fashion. This means that video data packets must be delivered to the destination on time, and failure to deliver video data packets on time would result in video quality degradation. This has two main implications: (i) from a network's point of view, one requires the availability, on a continuous basis, of a bandwidth at least equal to the signal's data rate; (ii) from a file and storage system point of view, one requires streaming capabilities which guarantee the continuity of each stream being retrieved or stored. Thus, in order to support multiple independent video signals, the network must have the necessary aggregate bandwidth as well as means to guarantee the bandwidth required for each video stream, and the file storage system must be of the streaming type and must have a capacity sufficient to handle all video streams. By the same token, there is a maximum number of video steams of a given data rate that a network and a server can support, and means are provided to prevent additional requests from overloading the system.
It is thus clear that the characteristics of video traffic differ substantially from those of traditional data traffic to the point that servers and local area networks designed primarily to support data applications are not appropriate to effectively support video services. New capabilities in servers and networks must be offered.
The focus in this application is on the design and operation of a storage system suitable for a video server, noting again that the storage requirements for video data are different from the storage requirements for typical LAN data in two respects:
(i) the size of video files is an order of magnitude greater or more; even with compression techniques, the physical storage needs are large. PA1 (ii) when serving a video stream, be it for recording or playback, it is desirable to maintain the continuity of the stream. In the case of playback, data must be retrieved from the storage medium, transmitted over the network, and made available to the decoder no later than the time at which it is needed so as to avoid letting the decoder run dry. Similarly, when a stream is getting recorded, the writing of data on the storage medium must keep up with the rate at which it is getting produced so as to avoid buffer overflow and data loss. PA1 (i) provide random access capability for both recording and playback; PA1 (ii) have the storage capacity required by the application; PA1 (iii) have the I/O throughput required to support simultaneously a target maximum number of users (streams); and PA1 (iv) guarantee a start-up latency for new streams within a specified maximum tolerable threshold. PA1 1) Each group of N.sub.d or N.sub.a consecutive segments from a file is allocated to the disks in the array or sub-array in a regular cycle pattern i.e. , disk 1, disk, 2, . . . disk N.sub.d (or N.sub.a) then disk 1, 2, etc. All of the segments in a stripe or sub-stripe have the same physical address in the corresponding disks. PA1 2) Each group of N.sub.d (or N.sub.a) consecutive segments from a file are allocated to the disks in the array or (sub-array) in a regular cyclic pattern, but all of the segments in a stripe (or sub-stripe) do not have the same physical address in their respective disks. PA1 3) Each group of N.sub.d (or N.sub.a) segments from a file is placed in the array of N.sub.d disks (or sub-array of N.sub.a disks) so that there is one segment on each disk, but the order varies (e.g., randomly) from one stripe (or sub-stripe) to another.
A server's storage system preferably satisfies the following requirements:
Due to their random access read and write capability, wide availability, and low cost, magnetic disk drives emerge as very appropriate storage components for this purpose. Multiple drives can be used in order to provide the necessary storage capacity and/or to achieve an aggregate throughput larger than that achieved with a single drive.
A magnetic disk storage system 20 is illustrated in FIG. 1. The disk storage system 20 comprises a plurality of disk drives 200. Each disk drive 200 comprises a disk 21 and a controller 210. The disk drive 200 is shown in greater detail in FIG. 2. As shown in FIG. 2, the disk 21 of the disk dive 200 comprises a plurality of platters 22. Each platter 22 has one or two magnetic surfaces, a bottom surface 23 and/or a top surface 24, for recording. Associated with each recording surface 23 or 24 is a read/write head 26. In the disk 21 of FIG. 2, let h denote the number of heads, and thus, usable surfaces. The heads 26 are moved to particular locations on the platter surfaces 23, 24 by the actuator 28 which is controlled by a controller 210. Thus, the controller 210 controls the proper positions of the read/write heads 26 and the transfer of data in both directions between the magnetic surfaces and a local buffer 30 which forms part of the controller 210. The controller 210 also manages the transfer of data across the SCSI bus 220 (see FIG. 1) into and out of a buffer internal to the adapter 230. The adapter 230 is then in charge of the transfer of data, via the system bus 250, into and out of the server computer system 26 which includes the memory 260, CPU 270, and network interface 280. In the case of a stand-alone system, the computer system 26 may not be a server and may not include a network interface.
As shown in FIG. 3, each recording surface 23, 24 is divided into a number of concentric tracks. Tracks on all surfaces which are located at the same radius form a cylinder. The number of tracks in a cylinder is thus equal to h. Let c denote the number of tracks per surface (and thus also the number of cylinders), and consider the tracks (and thus cylinders) to be numbered sequentially 1, . . . , c, starting with the outer track (cylinder). Each track is divided into a number of fixed size sectors. Due to the circular geometry of the surface, the number of sectors in a track is not the same for all tracks; there being more sectors in outer tracks than in inner tracks.
As shown in FIG. 4, the cylinders in the disk are divided into subsets of contiguous cylinders called zones, such that the number of sectors per track in a zone is the same for all tracks in the zone. We let Z denote the number of zones, and consider the zones to be numbered sequentially from 0 to Z-1 starting with the outer zone on the disk. In FIG. 4, the number of sectors in a track of zone i is designated .sigma..sub.i, and the number of cylinders in zone i is designated k.sub.i. Note that not all disks are organized into zones.
The disk rotates permanently at a constant speed of R rotations per minute, and the read/write heads are moved all together from one cylinder to another, as needed. All I/O transactions are for an integral number of sectors, the specific number of which depends on the application. To limit the overhead caused by head movement when writing or reading a block of data, the sectors on the disk are used consecutively and sequentially, going from sector to sector on a given track, from track to track in a given cylinder, and from cylinder to cylinder.
An example of a disk drive is the HP C2240 drive which has h=13 read/write heads, a total of c=2051 cylinders, and rotational speed of R=5400 rotations/minute. The 2,051 cylinders comprise 1981 data cylinders, 69 spares, and one for logs and maintenance information. They are organized into eight zones.
When a request for an I/O operation is placed on the disk storage system (say a read or write operation for some number of consecutive sectors), the heads are first moved under the control of the controller 210 to the cylinder where the first sector is located; the delay incurred in this operation is referred to as the seek time (X.sub.seek) The head corresponding to the appropriate track then waits until the first sector appears under it, incurring a delay referred to as the rotational latency (X.sub.ro). Once the first sector is located, the head begins reading or writing the sectors consecutively at a rate determined by the rotational speed; the time to read or write all sectors constituting the block is referred to as the transfer time (X.sub.transfer). Note that if the block of data spans sectors located on more than one track is a given cylinder, then a switch from one head to the next is made at the appropriate time, thus incurring a so-called head switch time. If the block spans sectors located on multiple cylinders, then a head movement from one cylinder to the next takes place, thus incurring a track-to-track seek time each time this occurs. Accordingly, in performing an I/O operation, a certain amount of time is required. To assess the performance of a disk supporting an application, an analysis of the time required in each transaction must be undertaken.
The total time required in performing a read or write operation for a block T.sub.I/O (block), is the sum of seek time, rotational latency, and transfer time. EQU T.sub.I/O (block)=X.sub.seek +X.sub.ro +X.sub.trans
FIG. 5 shows how the total time T.sub.I/O for a block is divided into, seek time, rotation latency, and transfer time. As shown in FIG. 5, the transfer time includes some head switch times and/or track-to-track seek times. It should be noted that seek times, rotational delays and transfer times may be random and not known a priori.
Note that to get the total time required to get the data transferred into the system's memory, one should also account for any additional time that may be incurred in contending for the SCSI bus and in transferring the data from the controller's buffer to the system's memory. However, as these operations take place to a large degree simultaneously with the transfer of data off the disk into the controller's memory, such additional delay is negligible and may be ignored.
In the patent application entitled METHOD OF OPERATING A DISK STORAGE SYSTEM filed for Fouad A. Tobagi, Joseph M. Gang, Jr., Randall B. Baird, Joseph W. M. Pang, and Martin J. McFadden on Nov. 17, 1992, bearing Ser. No. 07/977,493, now U.S. Pat. No. 5,581,784, a method for operating a disk storage system comprising one or more disks is described. The disk storage system is operated so as to simultaneously maintain the continuity of a plurality of data streams. The disk storage system may be located in a network such as a local area network and maintain the continuity of a plurality of streams in the network. The network server which includes the video storage system, and a plurality of user stations are connected via a shared transmission medium. In this case, a plurality of video streams may be transmitted simultaneously between various users and the server via the shared transmission medium. Alternatively, the disk storage system may be part of a stand-alone system in which a plurality of video streams are retrieved from storage and displayed locally on a monitor, or received from an external source and locally stored in the disk storage system. (Thus, the inventive method is described below in connection with a network such as a LAN, but it should be understood that the method of operating a disk storage system is not restricted to use in such a network. In addition, it should be noted that the inventive method is described below in connection with an array of disks. However, the invention is equally applicable to arrays of other storage units. For example, each storage unit may itself be a server and an associated array of disks). Disks may be optical disks, magnetic disks, other storage media or instead of disks, random storage media may be used such as semiconductor memories. The storage units may also be tape drives.
In accordance with the method described in U.S. patent application Ser. No. 07/977,493, now U.S. Pat. No. 5,581,784, I/O transactions take place in I/O cycles. For streams produced by (i.e., read out of) the disk storage system, the data is consumed by the network in consumption cycles which follow one another without gaps. For streams produced by the network to be written into the disk storage system, the data is produced in production cycles which follow one another without gaps.
Illustratively, the disk storage system comprises one disk. Each data stream is either produced by the network (e.g., produced by a video coder in the network) at a constant base rate of W.sub.base bits per second and consumed by the disk, or produced by the disk and consumed by the network (e.g., consumed by a video decoder in the network) at a constant base rate of W.sub.base bits/sec. One I/O transaction is performed for each stream in each of a plurality of successive I/O cycles of duration S/W.sub.base =T.sub.play. In each I/O transaction, a segment of S bits is stored in or retrieved from the disk. Illustratively, W.sub.base =1.2 Mbits/sec and S=40 Kbytes. It should be noted that T.sub.I/O, the total time for each I/O transaction (T.sub.I/O =X.sub.seek +X.sub.ro +X.sub.trans), is much shorter than T.sub.play which is the time it takes the network to produce or consume a segment of S bits of a stream.
The number of streams whose continuity can be simultaneously maintained is limited by the number of I/O transactions which can be performed in an I/O cycle of duration T.sub.play. This depends on the locations of the retrieved and stored segments in the disk (as T.sub.I/O for each transaction depends on the location) as well as the order in which the I/O transactions are scheduled.
Two modes of operation can be considered. The first mode of operation, known as the synchronous mode is such that the I/O transactions for the active streams are scheduled in a particular predetermined order in each I/O cycle, but the production or consumption cycles of duration T.sub.play in which the data segments of the active streams are produced or consumed by the network are not necessarily aligned with each other.
The second mode of operation is known as the gated operation and is the preferred mode. The order of I/O transactions for the active streams may vary from one I/O cycle to the next. The advantage of allowing the order of the I/O transactions to vary from cycle to cycle is that the I/O transactions to be performed in a cycle may be sorted according to their locations on the disk so as to minimize the total I/O overhead. In order to be able to let the order of I/O's to vary from cycle to cycle, it is important not to allow the consumption of segments fetched in a given I/O cycle to take place earlier than the end of the given I/O cycle or to allow the production of segments to be stored in a given I/O cycle to take place later than the beginning of the given I/O cycle. If consumed (produced) at the earliest (latest) time, then the consumption (production) cycle for a segment fetched (stored) in a given I/O cycle for a given stream would coincide with the following (preceding) I/O cycle. If this is the case for all streams, then the consumption and production cycles for all streams are aligned with each other, as well as with the I/O cycles.
One way of implementing the gated operation described above is as follows: a sequence of I/O cycles of duration T.sub.play is defined, i.e., T.sup.1.sub.play, T.sup.2.sub.play, . . . In each cycle T.sup.i.sub.play, each stream has one I/O transaction. In addition, in each cycle T.sup.i.sub.play, for each stream, one data segment S is consumed or produced by the network. Segments which are consumed by the disk in cycle T.sup.i.sub.play are produced by the network in I/O cycle T.sup.i-1.sub.play. Segments which are produced by the disk in cycle T.sup.i.sub.play are consumed by the network in cycle T.sup.i+1.sub.play. FIG. 6 illustrates the I/O cycles of three streams (stream 1, stream 2, and stream 3) and the corresponding consumption and production cycles.
Recall that the advantage of allowing the order of the I/O transactions to vary from cycle to cycle is that the I/O transactions to be performed in a cycle may be sorted according to their locations on the disk so as to minimize the total I/O overhead. Such sorting enables the number of I/O transactions performed in an I/O cycle of a given length to be optimized for a given segment size S.
To achieve a performance which is not dependent on the particular selection of files being played back or recorded (as pertaining to the location of the files on the disk), the locations of the segments belonging to each file can be randomized over the entire disk. The advantage of this is that the total I/O time (T.sub.I/O) for any segments fetched in a cycle of duration T.sub.play is a random variable which is independent of the I/O time of all other segments to be fetched in the same, as well as in other, I/O cycles of duration T.sub.play. As a result, the sum of the I/O times for all segments is the sum of independently and identically distributed random variables.
In order to increase the number of streams beyond that which is possible with a single disk, an array of multiple disks may be utilized. The number of disks in the array may be designated by N.sub.d.
The array of N.sub.d disks (or other storage units) is operated according to the following criteria (in accordance with U.S. patent application Ser. No. 07/977,493, now U.S. Pat. No. 5,581,784 which is incorporated herein by reference).
1. Video data is stored and retrieved from each disk in the array in the form of fixed size segments. The fixed segment size of S bits simplifies the tasks of storing and retrieving data from the disks, and the tasks of managing the system memory where video data is buffered (e.g., in memory 260 of FIG. 1) before it is stored on the disks or after it is retrieved from the disks.
2. Segments belonging to a video stream are striped across the disk array (e.g., consecutive segments are allocated to the disks forming the array in a cyclic fashion). Thus, when a video stream is being recorded or played back, all disks in the array are accessed equally and thus incur the same load. (Note that this assumes that all disks have the same storage capacity; otherwise, the disk with the smallest storage capacity dictates the overall storage of the array). FIG. 7 is an example of a horizontal stripe.
3. Segments within a disk (and thus horizontal stripes within the array) are randomly laid out on the disk (the array). Thus, when the disks are of the zoned type, the disk transfer time for a segment belonging to a stream being recorded or played-back is independent of the particular stream in question or the particular segment within that stream. Thus, the performance of a disk (and thus the array) in terms of the number of streams that it can serve is independent of the particular streams being served. It is not necessary that all the segments which form a stripe across the array have the same physical address in the corresponding disks. Rather, segments in each disk can be placed in segment bins independently of the other disks.
4. I/O transactions for each disk are scheduled in cycles, whereby the cycle time is equal to the time it takes a stream to consume or generate as many segments as there are disks in the array, assuming, for example, that all streams are of equal constant data rates. For each stream being served, one I/O transaction is scheduled for each disk in each cycle. Assuming all streams have a production or consumption bit rate W.sub.base, then the cycle time is N.sub.d S/W.sub.base, where N.sub.d is the number of disks in the array and S is the segment size. This corresponds to the fact that the rate at which data is generated must be equal to the rate at which data is consumed. The maximum number of streams N.sub.max that can be served (maintaining continuity at all times) is equal to the number of I/O transactions that can be performed in a disk in a cycle. This in turn depends on the segment size S and how the data is laid out on the disks (e.g. random or not). (Note that if the I/O throughput for all disks is not the same, then the disk with the smallest I/O throughput dictates the serving capacity of all other disks and thus of the disk array. Note also that if the data rate for a given stream is different from the nominal rate, then the number of I/O transactions performed in a cycle for that stream may be different from one; if the data rate for a stream is variable over time, then the number of I/O transactions for that stream may vary from cycle to cycle. In all cases, to maintain continuity for all streams being served, the number I/O transactions required in each cycle should be smaller than the total number of I/O transactions that can be performed in a cycle).
5. By restricting the consumption of segments fetched in a cycle for a play-back stream to begin no earlier than the start of the following cycle, and equivalently, by guaranteeing that the production of data constituting a segment for a stream to be stored is complete prior to the beginning of the cycle during which the corresponding I/O transaction is to take place, it is possible to allow the order of I/O transactions corresponding to the various streams being served to vary from cycle to cycle. The advantage of this is to be able to sort the I/O transactions in each cycle according to the location of the corresponding segments on the disk(s), and thus to decrease the overall seek overhead time incurred and thus optimize the number of I/O cycles that can be performed in an I/O cycle.
Consider an example shown in FIG. 8, wherein the disk array comprises N.sub.d =2 disks. Assume also there are three streams (stream 1, stream 2 and stream 3). Each stream has a bit rate S/W.sub.base, where S is the size of a segment retrieved from a disk and thus, the cycle time is 2S/W.sub.base. Three cycles, cycle i, cycle i+1, and cycle i+2, are shown in FIG. 8. In each cycle, there is one I/O transaction for each stream from each disk, for a total of six I/O transactions per cycle. The order of the I/O transactions for the two disks is different in cycle i and cycle i+1. For example, in cycle i, in disk 1, the order of the I/O transactions is stream 1, stream 3, stream 2 (as indicated by the different shadings used for the different streams). Note that the data for stream 1 fetched in cycle i is consumed in cycle i+1 and the data fetched for stream 1 in cycle i+1 is consumed in cycle i+2. Similarly, the data for stream 2 fetched in cycle i is consumed in cycle i+1, etc.
6. The start-up latency incurred by a play-back stream is a function of the I/O cycle length. Indeed, in order to take advantage of sorting, a play-back stream may not begin consumption of the fetched segments before the beginning of the cycle following that in which the segments were fetched. Thus, the longer the cycle length, the longer the latency.
7. The segment size is an important design variable which affects the system performance, and thus must be selected appropriately. The larger the segment size is, the larger is the number of streams of a given data rate that the disk array may serve (as access to the disks becomes more efficient); on the other hand, with the larger segment size and the larger number of streams, the cycle length is larger, and the start-up delay also increases; furthermore, the memory requirements of the buffer to serve all streams increases.
8. The memory requirement is function of the number of disks in the array N.sub.d, the number of users to be served, say N.sub.max, and the segment size S. With complete partitioning of the memory, the memory requirement is given by 2(N.sub.d) (N.sub.max)S. With full sharing, the memory requirement is (N.sub.d +1) (N.sub.max)S. If the number of streams to be served is N.sub.a1 &lt;N.sub.max, then the above equations hold with N.sub.max replaced by N.sub.a1.
As is evident from the above discussion, a disk is characterized by its storage capacity (i.e., the number of segments that can be stored on the disk), and its I/O capacity (i.e., the number of segments that can be stored or fetched in a given period of time). Typically disks used in an array such as that described above are identical, in that they all have the same storage capacity, and the same I/O capacity.
An alternate scheduling technique disclosed in U.S. patent application Ser. No. 07/977,493, now U.S. Pat. No. 5,581,784 involves dividing the streams into G groups. The size of each group may be for example N.sub.a1 /G. (N.sub.a1 is the number of allowed streams where N.sub.a1.ltoreq.N.sub.max). The consumption and production cycles of the streams in each group are time aligned with each other, but the consumption and production cycles of each successive group are offset by one (1/G).sup.th of an I/O cycle of duration N.sub.d S/W.sub.base, FIG. 9 shows the production and consumption cycles for G=8 groups of streams, for the case N.sub.d =4 disks.
As shown in FIG. 9, the I/O cycle is divided into G subcycles. In each subcycle of length N.sub.d S/W.sub.base G, the I/O transactions are performed for the group of streams whose production/consumption cycles begin at the end of the particular I/O subcycle. The group of I/O transactions performed in each subcycle is sorted separately, for example, alternating between increasing and decreasing order of cylinders so as to minimize seek overhead. As N.sub.d =4, the number of I/O transactions for a stream served in each subcycle is four. For example, as shown in FIG. 9, for a stream in Group 1, which is served in I/O subcycle #1, there is one segment retrieved from each of the four disks. It should be noted that G can be larger than or smaller than N.sub.d. This scheduling technique reduces the amount of buffers utilized. It should also be noted that in the operation described above, all disks experience the same load.
In yet another technique (also disclosed in U.S. patent application Ser. No. 07/977,493, now U.S. Pat. No. 5,581,784) for scheduling I/O transactions in each I/O cycle of duration N.sub.d S/W.sub.base, the streams are again divided into G groups and the size of each group is again, for example, N.sub.a1 /G. I/O transactions are performed in subcycles of length N.sub.d S/W.sub.base G. However, instead of retrieving or storing N.sub.d segments for each stream in an entire group of size N.sub.a1 /G, there is retrieved or stored one segment for each stream in N.sub.d groups. The segments retrieved or stored are those which begin consumption immediately following the subcycle or which were produced immediately preceding the subcycle. This scheduling technique leads to further reduction in the size of the buffer (e.g., the size of memory 260). In order to keep the load on all disks balanced during each I/O subcycle, it is desirable to guarantee that the segments to be retrieved or stored for each of the N.sub.d groups served in a subcycle come from separate disks.
Consideration is now given to how the disk storage system is able to support streams with different production/consumption rates. While dividing the I/O cycle into subcycles can lower buffer requirements, grouping multiple consecutive I/O cycles into a supercycle can accommodate streams with diverse bandwidth requirements. Consider a disk storage system comprising N.sub.d disks or other storage units which can support N.sub.a1 streams, each of nominal rate W.sub.base bits/sec. The basic scheduling algorithm of fetching one stripe per stream per I/O cycle of duration N.sub.d S/W.sub.base is now modified to allow fetching multiple stripes per stream per supercycle where a supercycle comprises N.sub.super consecutive I/O cycles. The number of stripes to be fetched in a supercycle depends on the stream rates. This scheduling algorithm can support streams of different rates ranging from a minimum of W.sub.base /N.sub.super bit/sec to a maximum of N.sub.a1 W.sub.base bits/sec with increments of W.sub.base /N.sub.super bits/sec. Streams that require rates that do not belong to the above denominations will be assigned a bandwidth of the next higher allowable rates, in which case certain I/O bandwidth is wasted. For example, let N.sub.super =16, then a stream with rate 1.25 W.sub.base bits/sec is allowed to fetch 20 stripes per supercycle, a stream with rate 0.125 W.sub.base bits/sec is allowed to fetch 2 stripes per supercycle, and a stream with rate 0.3 W.sub.base bits/sec is allowed to fetch 5 stripes per supercycle (although this stream may need to fetch fewer than 5 stripes in some supercycles).
To define the scheduling in more precise terms, let C.sub.i be the i.sup.th I/O cycle of a typical supercycle where i=1, 2, . . . , N.sub.super. For a particular stream, numbered j, L.sub.i,j stripes are allowed to be fetched in the i.sup.th I/O cycle. Clearly, L.sub.i,j must be an integer between 0 and N.sub.a1. To satisfy the disk I/O bandwidth constraint for C.sub.i within a supercycle, the sum of L.sub.i,j over all j must not exceed N.sub.a1. To satisfy the continuity constraint for stream j, the sum of L.sub.i,j over i=1, 2, . . . , N.sub.super must be greater than or equal to the amount of data, normalized by the stripe size, produced/consumed by stream j in a supercycle. To minimize latency and buffer requirements, it is desirable, for each stream j, to minimize the variance among the set of numbers {L.sub.i,j } as well as to minimize variance of the .SIGMA./j L.sub.i,j for all cycles i. The above criteria serve as the basis for admitting and scheduling the data retrieval of a stream.
The choice of N.sub.super, depends on a number of considerations. A large value of N.sub.super provides fine resolution of the streams and hence efficient use of I/O bandwidth. However, it increases start-up latency as well as the complexity of the scheduler. Thus, N.sub.super must be carefully selected for the anticipated mixture of stream rates and tolerable start-up latency.
It is an object of the invention to provide an alternative method for operating an array of disks or other storage units. In particular, it is an object of the invention to provide a method for operating an array of disks which has greater flexibility with respect to storage capacity, I/O capacity, and latency of new streams, is robust with respect to the addition of new disks and which minimizes buffer capacity. It is also an object of the invention to provide a method for operating an array of storage units which has advantageous properties with respect to reliability, scalability and multiple bit rates.