Network based storage, or simply “network storage”, is a common approach to backing up data, making large amounts of data accessible to multiple users, and other purposes. In a network storage environment, a storage server makes data available to client (host) systems by presenting or exporting to the clients one or more logical containers of data. There are various forms of network storage, including network attached storage (NAS) and storage area networks (SANs). In a NAS context, a storage server services file-level requests from clients, whereas in a SAN context a storage server services block-level requests. Some storage servers are capable of servicing both file-level requests and block-level requests.
A storage server typically manages one or more file systems. A “file system” is a structured (e.g., hierarchical) set of stored logical containers of data (e.g., volumes, logical units (LUNs), directories, files). A file system does not have to include or be based on “files” per se, however; it can be based on a different type of storage unit, such as LUNs, for example. In a typical file system, data is stored in blocks, where a “block” is the smallest unit of contiguous user data that the file system stores. However, a storage server may provide one or more levels of storage virtualization, in which case the physical data blocks (i.e., the blocks stored on the physical storage media) are represented by corresponding logical blocks in the storage server. Clients of the storage server may be aware of the logical blocks, but are generally not aware that the logical blocks are virtualizations of physical blocks.
A problem that affects certain file systems, particularly those used by or with virtual machines, is misalignment between logical blocks and physical blocks, i.e., “block misalignment”. In most file systems that exist today, the size of the logical data blocks in the file system is a multiple of a common disk sector size, 512 bytes. For example, common sizes of logical blocks in a file system are 4 KB (eight sectors) and 8 KB (16 sectors). A virtual machine file system is generally implemented as one or more virtual hard disks. A virtual hard disk is typically implemented as a file in a real (non-virtualized) file system. A virtual machine file system is often called a “guest file system”, whereas the non-virtualized (conventional) file system on which it is implemented is called the “host file system”.
Now to understand the misalignment problem, consider the following. Disks typically store 63 sectors of system metadata, such as a master boot record (MBR), partition table, etc., which occupy sector numbers 0 through 62, as shown in FIG. 1. This system metadata is sometimes called the “boot partition layout”. In FIG. 1, the top horizontal band represents the logical blocks in a file system while the bottom horizontal band represents the corresponding physical (on disk) blocks. User data storage begins at sector 63 on disk. Accordingly, the physical storage space allocated for the first logical block starts at sector 63 and ends with sector 70.
A virtual disk, which is a file representing a disk, also stores the above-mentioned system metadata. If the physical disks have 512 byte sectors, then this system metadata is (63 sectors)*(512 bytes-per-sector)=32,256 bytes long. Since a virtual disk is stored on the host filesystem, it utilizes 32,256/4,096=7.875 host file system blocks, assuming the host file system uses 4 KB blocks. In other words, the system metadata of the virtual disk occupies seven full host filesystem blocks (56 sectors*512 bytes/4K=7) and 7*512=3,584 bytes of the eighth host filesystem block.
Further, because the MBR, partition table, and other system metadata fill disk sectors 0 through 62, block 0 of the guest filesystem in the virtual disk starts at an offset of 63*512=32,256 bytes in the virtual disk file. Yet inconveniently, that offset is at byte 3,584 in the eighth host filesystem block, as explained above. Consequently, when the guest file system reads a 4 KB chunk of data (it will read multiples of 4 KB if its block size is 4 KB), the 4 KB being read is from the last 512 bytes of the eighth host file system block and the first 3,584 bytes of the ninth host file system block, as illustrated in FIG. 1. Thus, even though the host and guest file systems have the same block size, their starting offsets are not aligned. This situation is block misalignment.
A result of block misalignment is that two reads from disk are required to read a single guest file system block, causing a slowdown in performance. Similarly, to write a guest file system block, two reads from disk and two writes to disk would be required. In FIG. 1, a guest file system read of a particular logical block 10 actually requires two separate reads at the disk level: The first would be to read the last 0.5 KB of the eighth contiguous block on disk (i.e., sector 63), and the second would be to read the first 3.5 KB of the ninth contiguous block on disk (i.e., sectors 64-70). The shaded band in FIG. 1 shows how, for a given file, a logical block 10 actually corresponds to two different physical blocks, due to block misalignment.
Existing storage systems deal with block misalignment, if at all, by temporarily taking the file system off-line, realigning the misaligned blocks by using a special software tool, and then bringing the storage system back online. However, taking a storage system off-line for any length of time and for any reason may be undesirable, particularly in large-scale, enterprise-class network storage systems that handle mission-critical data and backup functions and are relied upon by hundreds or thousands of users.