1. Field of the Invention
This invention relates to a method of declustering data among a number of data storage units or disks.
2. Discussion of the Known Art
As the density and capacity of information storage media increase, the need of modern scientific and business applications for mass data storage also grows. For example, data sensed by earth orbiting satellites is typically collected and stored in remote-sensing databases. In the year 2000, United States satellites alone are expected to transmit about one terabyte of data to earth daily for storage.
While media for accommodating terabytes of data storage are now available, efficient retrieval of the stored data often remains a problem. Specifically, current disk storage technology has inherent mechanical limitations that tend to limit the speed and the efficiency with which data can be retrieved from a number of storage disks, each of which has a minimum seek time. One approach to improve input/output storage speed (bandwidth) is to distribute data blocks among multiple disks, so that the blocks can be retrieved in parallel, i.e., during overlapping disk seek periods.
If data blocks of files are partitioned and stored among a number of disk storage units, then an efficient declustering scheme is important to ensure that the data blocks can be retrieved quickly from all units in which blocks of a requested file are stored. Satisfactory declustering schemes ensure that data blocks requested by a given query will in fact have been initially distributed to different disks.
Consider multidimensional data that is organized as a uniform data grid, for example, geospatial data that can be mapped in a two-dimensional raster format. In this format, a dataset is divided into columns and rows of xe2x80x9ctilesxe2x80x9d, each of which is assigned to and stored as a separate data block on one of a number of disks. Spatial range queries against the dataset are expressed as rectangular clips on the map. Thus, all tiles within the boundary of a rectangle corresponding to a given query must be retrieved. In a multi-disk system, the response time for the query is therefore influenced mostly by the access time of that disk which stores the most tiles within the rectangle.
Various declustering schemes have been proposed to reduce response time for multidimensional queries, including some proposed specifically for uniform data grids. Other known schemes are more suitable for non-uniform data grids.
Consider a dataset organized as a grid of Nxxc3x97Ny tiles (i.e., Nx columns and Ny rows). Given M disks, a declustering scheme, s: N1xc3x97N2xe2x86x92M, assigns tile (x,y) to the disk numbered s(x,y). A xe2x80x9crange queryxe2x80x9d is a query that retrieves a rectangular set of tiles contained within the grid. Let tilei (s,Q), i=0, 1, . . . ,Mxe2x88x921, represent the number of tiles in Q that get assigned to disk i under declustering scheme s (parameter s is omitted when clear from the context). Define the (nominal) response time of query Q under scheme s to be RT(s,Q)=max0xe2x89xa6i less than M tilei(s,Q) . A unit of response time may be considered the average disk access time (including seek, rotational, and transfer time) to retrieve a data block. Thus, the notion of response time indicates an expected I/O delay for answering a given query. The problem, therefore, is how to devise a declustering scheme that minimizes the query response time.
An ideal declustering scheme would achieve, for each query Q, an optimal response time ORT(Q)=┌|Q|/M┐, where |Q| is the number of tiles in Q. It has been shown that such a scheme, known as the Strictly Optimal (SO) scheme, does not exist in general, except for a few stringent cases. The SO scheme thus serves as a lower bound against which performance of other schemes may be measured.
An early scheme proposed for data declustering is the Disk Modulo (DM) scheme, which assigns tile (x,y) to disk number (x+y) mod M. A known Field Exclusive-Or (FX) scheme assigns tile (x,y) to disk number dec(bi(x) ⊕ bi(y))mod M. Here, bi(x) is the binary representation of x, and dec(z) is the decimal number corresponding to the binary representation z.
A so-called Hilert Curve Allocation Method (HCAM) imposes a linear ordering on the tiles in the grid, and assigns disk numbers 0,1, . . . Mxe2x88x921 to the tiles in a round robin fashion according to the linear ordering. HCAM has been shown to yield better performance than previously known schemes, including DM and FX.
Recently, it was shown that a family of Cyclic Declustering (CD) schemes, a simple generalization of the Disk Modulo scheme, outperforms HCAM. Given M disks, the CD schemes assign tile (x,y) to disk numbered (x+Hxc2x7y) mod M, where H, called the skip value, is a predetermined parameter. To find a good skip value (or good skip values for multidimensional data), two heuristics-based schemes were proposed, both of which outperformed HCAM. One scheme (GFIB) is based on Fibonacci numbers, and the other (EXH) is based on an exhaustive search of the best skip values. When M=Fk is the k-th Fibonacci number, the GFIB scheme uses the skip value H=Fkxe2x88x921. The EXH scheme evaluates performance for each possible value of H, and chooses the one with the best performance. The performance is evaluated in terms of xe2x80x9caverage response timexe2x80x9d, which is taken against all queries within an Mxc3x97M grid.
Extensions of GFIB and EXH for multidimensional data have been given. For d-dimensional data, the CD schemes map tile (x0,x1, . . . , xdxe2x88x921) to the disk numbered (x0+H1x1+ . . . Hdxe2x88x921xdxe2x88x92)mod M. When M=Fk is the k-th Fibonacci number, GFIB uses skip values: H1=Fixe2x88x921, H2=Fixe2x88x922, . . . Hdxe2x88x921=Fixe2x88x92d+1. The EXH scheme performs a greedy exhaustive search. First, the best value for H1 is determined by considering all queries in a Mxc3x97M grid. Then, fixing H1, EXH determines the best value for H2 (among the candidates 1,2, . . . ,M) by considering all 3D queries in a grid of size Mxc3x97Mxc3x97M. The process proceeds toward higher dimensions until all skip values are found.
It has been shown that EXH has average performance close to the Strictly Optimal scheme, and that it outperforms GFIB. EXH requires substantial computational overhead with respect to GFIB in determining the skip values, however. This renders EXH impractical for multidimensional data, and feasible only for small values of M or for small dimension data.
Golden Ratio Sequences (GRS) are highly regular sequences in which elements of certain sets are almost uniformly distributed. The sequences themselves were developed in A. Itai, et al., A Golden Ratio Control Policy For a Multiple-Access Channel, IEEE Trans. on Automatic Control, AC-29:712-718 (1984), and are based on an open addressing hashing method described in D. E. Knuth, The Art of Computer Programming, v.3, Addison-Wesley (1973). The golden ratio sequences have been applied in packet routing, see H. Hofri, et al., Packet Delay Under the Golden Ratio Weighted TDM Policy in a Multiple-Access Channel, IEEE Trans. on Information Theory, 11-33:341-349 (1987), and were recently used in A. Bar-Noy, et al, Minimizing Service and Operations Cost of Periodic Scheduling, 9th Annual ACM-SIAM Symp. on Discrete Algorithms (1998) to develop almost optimal procedures for hard problems such as minimizing mean response time in Broadcast Disks, Teletext Systems, and Maintenance Scheduling problems. As far as is known, however, GRS has not been applied to solving the problem of obtaining an efficient data storage declustering scheme.
According to the invention, a method of declustering data among a number of storage units to minimize response time for range queries, includes identifying data to be stored in the form of a dataset, wherein data is divided among uniform data blocks (x, y), and each data block is to be stored in one of a number (M) of storage units. M pairs (i, keyi) are defined wherein i=0,1, . . . ,Mxe2x88x921; and keyi is the fractional part of 2i/(1+{square root over ( )}5). First components of the pairs are ordered according to an ascending order of the keyi values associated with the pairs, thus defining a permutation array A of M. Another permutation array B is defined such that, for i=0,1, . . . ,Mxe2x88x921; B[A(i)]=i. Data block (x, y) is then assigned to storage unit (xxe2x88x92B[y mod M]) mod M, wherein mod is the remainder function.
For a better understanding of the invention, reference is made to the following description taken in conjunction with the accompanying drawing and the appended claims.