A storage server is a special purpose processing system used to store and retrieve data on behalf of one or more clients on a network. A storage server operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. In conventional storage systems, the mass storage devices may be organized into one or more groups of drives (e.g., redundant array of inexpensive disks (RAID)). These drives, in turn, define an overall logical arrangement of storage space, including one or more storage volumes. A storage volume is any logical data set that is an abstraction of physical storage, combining one or more physical storage devices (e.g., drives) or parts thereof, into a logical storage object.
A conventional storage server includes a storage operating system, which may implement a file system to logically organize data on the drives. A file system is a structured (e.g., hierarchical) set of stored data, such as directories and files, blocks and/or any other type(s) of logical data containers (hence, the term “file system”, as used herein, does not necessarily include “files” in a strict sense). Data stored by a storage server may be stored in the form of multiple blocks that each contain data. A block is the basic unit used by a file system in a storage server to manipulate and transfer data and/or metadata. In many system, a block size of 4 KBytes is used, although other block sizes can also be used.
A storage server may implement various features and functions, such as the generation of certain kinds of data storage images. Image generation may, for example, include mirroring (a technique in which a mirror copy of certain data at one location is maintained at another location), creation of snapshots and/or clones of storage volumes, etc. Mirroring of data may be done for various different purposes. For instance, mirroring provides a mechanism for ensuring data availability and minimizing down time, and may be used to provide disaster recovery. In addition, snapshots provide point-in-time images of data, and clones generally provide a writeable image of data, which may be used for various purposes in data operations.
Conventional storage servers boot up (initialize) after the power is turned on, before they can be used. To accomplish boot up, various metadata is first required by the storage server and is therefore retrieved from specific blocks of storage. In the prior art, these different blocks were typically retrieved in a serial and/or sequential manner, because these blocks are generally interdependent (e.g., by referencing each other in a hierarchical manner). For example, a first block may be referenced by a second block, etc. Thus, during boot up, the second block would have to be retrieved before the first, etc. Such sequential input/output (I/O) access pattern (i.e., the need to access the blocks in a particular order) constrains the speed with which the blocks can be retrieved and boot up completed. Yet in many applications, particularly enterprise scale storage systems, fast boot up time is essential for meeting users' expectations. Boot-up latency is further exacerbated by the larger number of blocks that are typically required during boot up in more advanced (complex) storage servers, i.e., with the incorporation of more features and functions.
A similar latency problem can occur in the context of other types of initialization processes of a storage server. For example, latency associated with retrieving needed metadata blocks can also be problematic in the context of failover/giveback. “Failover” occurs when one storage server takes over responsibilities of another storage server which has experienced the failure. “Giveback” is the process by which a failed storage server resumes its responsibilities after recovering from the failure. As with boot up, failure/giveback processes often require the retrieval of an initial set of interdependent blocks.
Failover/giveback techniques can be used for both masking server failures from clients and also to provide nondestructive upgrades where individual servers in a cluster are taken off-line, upgraded, and brought back online with minimum perceptible impact on the clients. For some applications, performing these operations quickly is critical to correct and successful functioning. For example, if the delay for failover or giveback is longer than some fixed interval, client requests will time out, resulting in application failures. Therefore, it is important to perform these operations as quickly as possible.
Latency associated with retrieving needed metadata blocks can also be problematic when mounting a storage volume (or simply, “volume”). “Mounting” a volume is the process of making the volume accessible to the file system (and, hence, the user). Mounting a volume involves the attachment of the file system to the file/directory hierarchy. Mounting a volume, like boot up and failure/giveback processes, often requires the loading of an initial set of interdependent blocks. In situations where multiple storage volumes are to be mounted, the latency associated with mounting is compounded.
Other types of initialization processes may also be subject to similar latency concerns.