Many computing environments include file systems, which enable other application programs to store data on and retrieve data from storage devices. In particular, a file system allows application programs to create files and to give them names (a file is a named data object of arbitrary size), to store (or write) data into files, to read data from files, to delete files, and to perform other operations on files.
A file structure is the organization of data on the storage devices. In addition to the file data itself, the file structure contains meta data, which includes, for instance, the following: a directory that maps file names to the corresponding files; file meta data that contains information about the file, including the location of the file data on the storage device (i.e., which device blocks hold the file data); an allocation map that records which device blocks are currently in use to store meta data and file data; and a superblock that includes overall information about the file structure (e.g., the locations of the directory, allocation map, and other meta data structures).
In order to store successive data blocks of a file to distinct devices, such as disks or other storage devices, a technique known as striping is used. Striping may also be used to store the file system's meta data. The advantages of striping include high performance and load balancing. In striping, the file system writes successive blocks of a file, or the file's meta data, to distinct devices in a defined order. For example, the file system may use a round-robin allocation, in which successive blocks are placed according to a cyclic permutation of the devices. This permutation is called the stripe order. The stripe order defines the order and frequency of allocations (and thus, writes) to each device in the file system. For example, a system with four disks using a simple round-robin allocation scheme would allocate space on each disk in consecutive order, namely: 1, 2, 3, 4, 1, 2, 3, 4 . . . .
This simple round-robin allocation is used by most striped file systems for allocation. Although, round-robin allocations may be sufficient in some circumstances for a system that includes homogeneous devices, it proves to be inadequate for a system with heterogeneous devices, and it proves to be inadequate for various circumstances in which homogeneous devices are used.
As one example, a round-robin allocation is inadequate for devices of different storage capacities or throughput. Under round-robin allocation, all devices are allocated equally. Consequently, subsequent access to the data is typically spread equally across the devices as well. For systems that include devices with different storage capacities, the small devices fill before the larger devices and then, must be excluded from the stripe order, thus reducing the parallelism and performance for all subsequent writes. Furthermore, the data striped across the reduced set of devices has reduced performance for all subsequent accesses.
Likewise, for systems that include devices with different throughput, round-robin allocation fails to maximize the throughput for allocation and all subsequent accesses to the data. Additionally, round-robin allocation has no capability for rebalancing a system that is in an unbalanced state. An unbalanced state can occur for a variety of reasons including, for instance, when devices are partitioned between files or operating systems; when empty devices are added to an existing file system; or when the allocation policy changes. To rebalance such a system, extraordinary measures are required by the user, such as restriping of all the data in the file system.
Striping can be performed by a single file system, or by a plurality of file systems of a shared device file environment (e.g., a parallel environment). In a shared device file environment, a file structure residing on one or more storage devices is accessed by multiple file systems running on multiple computing nodes. A shared device file environment allows an application (or job) that uses the file structure to be broken up into multiple pieces that can be run in parallel on multiple nodes. This allows the processing power of these multiple nodes to be brought to bear against the application.
The above-described problems associated with striping are exacerbated in a parallel environment. Thus, a need still exists for a parallel allocation technique that is general enough to be used in a wide variety of circumstances. Further, a need exists for a capability that enables rebalancing of the allocations to better match the current conditions and requirements of the system and/or devices.