The creation and storage of digitized data has proliferated in recent years. Accordingly, techniques and mechanisms that facilitate efficient and cost effective storage of large amounts of digital data are common today. For example, a cluster network environment of nodes may be implemented as a data storage system to facilitate the creation, storage, retrieval, and/or processing of digital data. Such a data storage system may be implemented using a variety of storage architectures, such as a network-attached storage (NAS) environment, a storage area network (SAN), a direct-attached storage environment, and combinations thereof. The foregoing data storage systems may comprise one or more data storage devices configured to store digital data within data volumes.
A data storage system includes one or more storage devices. A storage device may be a disk drive organized as a disk array. Although the term “disk” often refers to a magnetic storage device, in this context a disk may, for example, be a hard disk drive (HDD) or a solid state drive (SSD) or any other media similarly adapted to store data.
In a data storage system, information is stored on physical disks as storage objects referred to as volumes that define a logical arrangement of disk space. The disks in a volume may be operated as a Redundant Array of Independent Disks (RAID). A volume may have its disks in one or more RAID groups. The RAID configuration enhances the reliability of data storage by the redundant writing of data stripes across a given number of physical disks in a RAID group and the storing of redundant information (parity) of the data stripes. The physical disks in a RAID group may include data disks and parity disks. The parity may be retrieved to recover data when a disk fails.
Information on disks is typically organized in a file system, which is a hierarchical structure of directories, files and data blocks. A file may be implemented as a set of data blocks configured to store the actual data. The data blocks are organized within a volume block number (VBN) space maintained by the file system. The file system may also assign each data block in the file a corresponding file block number (FBN). The file system assigns sequences of FBNs on a per-file basis, while VBNs are assigned over a large volume address space. The file system generally comprises contiguous VBNs from zero to n−1, for a file system of size n blocks.
An example of a file system is a write-anywhere file layout (WAFL) that does not overwrite data on disks when that data is updated. Instead an empty data block is retrieved from a disk into a memory and is updated or modified (i.e., dirtied) with new data, and the data block is thereafter written to a new location on the disk. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks, which results in efficient read operation. When accessing a block of a file in response to a request, the file system specifies a VBN that is translated into a disk block number (DBN) location on a particular disk within a RAID group. Since each block in the VBN space and in the DBN space is typically fixed (e.g., 4 K bytes) in size, there is typically a one-to-one mapping between the information stored on the disks in the DBN space and the information organized by the file system in the VBN space. The requested data block is then retrieved from the disk and stored in a buffer cache of the memory as part of a buffer tree of the file. The buffer tree is an internal representation of blocks for a file stored in the buffer cache and maintained by the file system.
If a data block is updated or modified by a central processing unit (CPU) or processor, the dirty data remains in the buffer cache for a period of time. Multiple modifying operations by the CPU are cached before the dirty data is stored on the disk (i.e., the buffer is cleaned). The delayed sending of dirty data to the disk provides benefits such as amortized overhead of allocation and improved on-disk layout by grouping related data blocks together. In the write anywhere file system, the point in time when a collection of changes to the data blocks is sent to the disk is known as consistency point (CP). A CP may conceptually be considered a point-in-time image of the updates to the file system since the previous CP. The process of emptying the buffer cache by sending the dirty data to the disk is accomplished by collecting a list of inodes that have been modified since the last CP and then cleaning the inodes by flushing the inodes to the disk. An inode is a data structure used to store information, such as metadata, about a file, whereas data blocks are data structures used to store the actual data for the file. The information in an inode may include ownership of the file, access permission for the file, size of the file, and file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers which may reference the data blocks.
Initially a CPU issues a cleaner message indicating that the dirty buffers of one or more inodes need to be allocated on disks. In response, a block allocator in the file system selects free blocks on disks to which to write the dirty data and then queues the dirty buffers to a RAID group for storage. The block allocator examines a block allocation bitmap to select free blocks within the VBN space of a logical volume. The selected blocks are generally at consecutive locations on the disks in a RAID group for a plurality of blocks of a particular file. When allocating blocks, the file system traverses a few blocks of each disk to lay down a plurality of stripes per RAID group. In particular, the file system chooses VBNs that are on the same stripe per RAID group to avoid parity reads from disks.
For efficient utilization of storage resources, it is desirable to balance block allocation across storage devices in RAID groups. Improvements which will allow balanced block allocation are desired.