The present invention relates generally to storing data to an array of storage media or servers, and more particularly to methods to stripe the data to store in the array of storage media or servers.
In our information age, it is common to use an array of storage media to hold information. The information can be represented as data objects, whose sizes can range from kilobytes to gigabytes. If a data object is stored in one storage medium, and if the demand on that data object is much higher than other objects, the loading on that storage medium becomes much higher than others. This creates load imbalance. One typical way to balance the load is to stripe the data object into data units, and to store the units into the storage media in a round-robin fashion. This is the storage approach behind typical disk arrays, and the approach has also been extended to server arrays. An extensive discussion of the extension can be found, for example, in an article, entitled, "A Server Array Approach for Video-on-demand Service on Local Area Networks," published in INFOCOM '96, on March 1996, by Lee and Wong. The article describes an array of servers accessed by a number of clients.
A typical disk array system has a few different configurations. When a user formats a disk array, one configuration is selected. From then onwards, the organization and the size of the data units are fixed, with the storage spaces at each disk drive divided into units of the fixed striping size. Fixing the organization implies fixing the storage location of the data units of the data objects. Not only are the locations for the storage of the data objects fixed, the locations for the storage of redundant units or symbols are also fixed.
Redundant symbols or units are stored for fault tolerance reasons. In case one or more of the storage media fail, with the right level of redundancy based on the redundant symbols, the lost data can be restored. In a disk array system, the locations or sectors holding the redundant units are also fixed.
This organization is applied to all types of data objects. After a data object is striped into the fixed striping size, typically the data units are stored in the media in a round-robin fashion, skipping locations for redundant units wherever they exist.
A commonly used term in the disk array system is a stripe, which is defined as a group of data units and one or more redundant units that are generated by the data units. A stripe unit can be a data unit or a redundant unit.
After the data units are stored in the media, values for the redundant units are calculated. This is done stripe by stripe across the storage media. The above described process to store data into storage media is known as storage striping.
FIG. 1 shows one way to stripe a data object using storage striping. Each box in FIG. 1 denotes a stripe unit, such as 150, which denotes a number of bytes of data, such as 64 Kilobytes. Each stripe unit may store one or more symbols. Each vertical column of stripe units denotes the units stored within one storage medium or one server; for example, the units 152 and 154 are in the medium 156. With five columns shown in FIG. 1, there are five storage media or servers. Each row of stripe units represent a stripe; for example, the units 158 and 160 are two units for the stripe 162. In this example, each stripe also includes a redundant unit, such as 161. The locations for the redundant units are fixed, such as the locations for the units 161 and 164. The data units are stored around the redundant units. The redundant unit within each stripe can be generated by performing the exclusive-or operation on all of the data units within the stripe. All data objects are striped the same way.
Typically, data objects are of different sizes. If the size of a data object is not an integral multiple of the size of a stripe unit, storage spaces will be wasted at the last storage unit. This is known as internal fragmentation, which increases overhead. One way to represent overheads is as follows: EQU Normalized Storage Overhead or NSO=(Storage overhead)/(Storage size of the data object).
FIG. 2 shows a graph of normalized storage overhead versus the size of a stripe unit in storage striping. One calculation to generate FIG. 2 is shown in Appendix A. As the size of a stripe unit increases, the normalized storage overhead increases correspondingly, and the relationship is approximately linear. Note that storage overhead becomes impractical if the size of a stripe unit is larger than a few tens of kilobytes. The graph also shows the effect of having redundancy in the system for fault tolerance. It directly increases storage overhead, as shown in FIG. 2, because redundant units need additional storage.
Another issue to consider in a disk array system is on retrieving a data object from the storage media. A retrieval request for a data object can generate multiple retrieval operations in the storage media. For example, if the data object occupies one megabyte and the size of a stripe unit is one kilobyte, then it takes one thousand operations to retrieve the data object.
To consider the retrieval issue, stripe units of a data object are assumed to pack using a minimum number of stripes. In other words, the stripe units of the data object are not spread among many stripes, but are concentrated in as few stripes as possible. From this assumption, an average number of retrieval per byte (ANR) is calculated against the size of a stripe unit. This average number of retrieval per byte is defined as the average number of retrieval required to get a byte if the striping unit size is known, and if the probabilities of retrieving object of any sizes are the same. Actually, the probability of retrieving objects of different sizes are different.
In one approximation, the probability of retrieving object of different sizes are assumed to follow the WebStone retrieval distribution. FIG. 3 shows the WebStone retrieval distribution for web servers. It plots the probability of retrieving an object having a certain size; for example, the probability of retrieving an object having the size of 500 bytes in a typical web server is 0.5. Based on the WebStone approximation, a weighted average number of retrieval per byte (WANR) is calculated against the size of striping units. FIG. 4 shows a graph of WANR against striping sizes for normal mode of operation, and for failure mode, where there is media or server failure. The plots are in the form of steps because the WebStone approximation only has five samples. One calculation to generate FIG. 4 is shown in Appendix B.
Referring back to FIG. 4, one can see that WANR increases rapidly for small stripe unit sizes (10 to 15 kilobytes). In order to have good retrieval performance, or to have low WANR, stripe unit sizes should be large, such as larger than 20 kilobytes. The figure also shows that for large stripe unit sizes (&gt;20 kilobytes), failure mode requires significantly more retrieval. This is because using large striping sizes results in fewer stripes per data object, which in turn increases the significance of retrieving extra stripe units at the last stripe, when there is failure.
On overhead storage, one prefers low stripe unit size. However, on retrieval, one prefers high stripe unit size. FIG. 5 shows the tradeoff in storage striping between the two factors: weighted average number of retrieval per byte vs. normalized storage overhead. The figure shows the difficulty of achieving low overhead storage and high retrieval efficiency simultaneously based on storage striping.
The difficulty is not diminished when storage striping is applied to multimedia data objects, which can be of very different sizes--a HTML page of text may only occupy a few kilobytes, while a picture requires a few megabytes, and a compressed movie a gigabyte. If the size of the striping units is one kilobyte, accessing a movie will take tremendous number of retrieval. However, if the size of the striping units is one megabyte, a lot of space will be wasted to hold a page of text due to internal fragmentation.
It should be apparent that there is a need for a new method to stripe a data object to be stored in an array of storage media. The method should provide low overhead storage and high retrieval efficiency. Also, the method should be equally applicable to both an array of storage media and an array of servers.