A Logical Volume Manager (LVM) is a software implemented manager for disk drives and other mass-storage devices. An LVM allocates space on mass-storage devices so that disk partitions can be combined into larger virtual ‘partitions’ that can be resized or moved. The resulting logical volumes can include several physical volumes. An LVM often includes a striping or a dithering feature that spreads saved data over different disks for improved access speeds. Mirroring or parity is also used to improve fault tolerance. The combinations and mappings of partitions coupled with striping and/or dithering can result in a complex relationship between logical storage space and physical storage space which is managed by the LVM.
Commonly, an LVM resides in the kernel of a computer's operating system. The kernel resides in memory allocated for the kernel, as opposed to memory allocated for the user space, as shown in FIG. 1. Such prior art LVMs include those provided in HP-UX and Linux, as well as third-party solutions such as Veritas VxVM. A kernel-based LVM typically receives instructions from a software application to read and write data from/to a logical volume, and the kernel-based LVM takes care of reading/writing from the physical volume.
With reference to FIG. 1, a node 102, which can include a server, workstation, cell phone, or other computing device, may or may not be connected to a network. Node 102 can have memory reserved for kernel 106, sometimes called kernel space, and memory reserved for applications and data, user memory 104. Kernel-based LVM 112 intercepts read/write instructions from database application 110 and manages the proper input/output (I/O) between the application and disks 108. This is sometimes referred to as “in-band” I/O.
The arrows in the figures generally show references or mappings as well as the general direction of commands and messages. The directions of the arrows are not intended to limit communications to one direction. For example, and as is understood in the art, reading and writing can occur in both directions of each arrowed connection in FIG. 1, such as between application 110, LVM 112, and disks 108.
Disks 108 can represent any type of persistent storage, such as storage which is directly attached to a node as a peripheral device (direct-attached storage), storage which is attached to another node on a network and accessible through the network (network-attached storage), or a storage area network. A storage area network commonly refers to a set of specially networked persistent storage devices in which anode on a network addresses disks at the block level. The node, and its LVM, are left to manage the file system of the storage devices. Shared storage is storage which is shared between multiple nodes on a network.
Application 110 can be a small scale application on a single-user device, or it can be a large-scale application which services thousands of users. Large scale applications are common for Web-based storefronts, such as those for well-known online retailers. Large scale applications can include massive database applications which manage terabytes of data. One-hundred terabyte to multi-petabyte databases are not uncommon for modern applications of large companies. To run with the minimal latencies that most online consumers and vendors expect, such databases typically require caching of actively used tables, queries, and other data in a “database buffer cache.”
In line with customer and vendor expectations of low latencies is the expectation of high reliability and minimal downtime. Such downtime is often measured in seconds or fractions of a second. At extremely busy retail Web sites, a few seconds of downtime can coincide with several, if not hundreds of database requests. When downtime occurs at the same time as a database request, the request can end up being delayed, denied, or even dropped. Thus, a Database Administrator (DBA) typically strives to avoid unnecessary downtime and schedule necessary downtime sparingly and for periods when the database load is light, for example, at 2:00 AM on a Sunday morning.
Upgrading an LVM often requires downtime. The upgrade may be necessary, because of a newly-found bug or security hole in the LVM which needs to be patched. Alternatively, the upgrade may be elective, in order to improve performance or maintainability of the LVM.
Besides scheduled downtime, there are cases of unscheduled downtime. An LVM can stop or fault on its own due to a variety of reasons, including traffic saturation, hardware failures, or bugs in the LVM code. In such a case, rapid re-initialization or rebooting of the LVM is required to minimize downtime. Whether due to a fault or a scheduled upgrade, stopping the LVM can result in a substantial amount of unwanted downtime.
With kernel-based LVM 112 in FIG. 1, stopping the LVM is often associated with stopping large portions of or the entire kernel of a node. This is because other parts of the kernel may hold references to the kernel-based LVM, and broken references in the kernel may lead to a complete computer crash if the kernel is not deactivated. Thus, it is generally more difficult to hot-swap a kernel-based LVM without inactivating the entire kernel. An architecture and method for enabling hot-swapping of a kernel-based LVM is disclosed in Baumann, et al., “Module Hot-Swapping for Dynamic Update and Reconfiguration in K42,” Proceedings of Linux.conf.au, Canberra, Australia, April 2005, 8 pages. However, the paper does not address the problems encountered in shared storage access by multiple nodes or of LVMs residing in user memory.
A user-memory-based LVM avoids the situation where parts of the kernel hold critical references to the DIM which would crash the system if broken. Such a user-memory-based LVM includes such software products as Oracle Automatic Storage Management (ASM), which has been implemented in Oracle Database 10g/11g.
A user-memory-based LVM such as ASM commonly uses an “extent map,” sometimes called an Xmap. An Xmap is typically created for each file accessed by an application which uses the LVM. The Xmap includes a listing or table of physical drives or partitions, as well as offsets from the beginning of each physical drive or partition, thereby ‘mapping’ the file to where it is read from or written to persistent storage. An Xmap can have many, if not hundreds or thousands of rows.
Because an application using ASM generally does not direct I/O through the kernel in order to map a file to persistent storage, this type of I/O is sometimes referred to as “out-of-band” I/O.