Storage virtualization inserts a logical abstraction layer or facade between one or more computer systems and one or more physical storage devices. With storage virtualization, a computer can address storage through a virtual disk (VDisk), which responds to the computer as if it were a physical disk (PDisk). A VDisk may be configured using a plurality of physical storage devices and techniques to provide redundancy and improve performance.
Virtualization is often performed within a storage area network (SAN), allowing a pool of storage devices with a storage system to be shared by a number of host computers. Hosts are computers running application software, such as software that performs input or output (IO) operations using a database. Connectivity of devices within many modern SANs is implemented using Fibre Channel technology. Ideally, virtualization is implemented in a way that minimizes manual configuration of the relationship between the logical representation of the storage as one or more VDisks, and the implementation of the storage using physical devices. Tasks such as backing up, adding a new physical storage device, and handling failover in the case of an error condition should be handled as automatically as possible.
In effect, a VDisk is a facade that allows a set of PDisks, or more generally a set of portions of PDisks, to imitate a single PDisk. Virtualization techniques for configuring the PDisks behind the VDisk facade can improve performance and reliability compared to the more traditional approach of a disk drive directly connected to a single computer system. Standard virtualization techniques include mirroring, striping, concatenation, and writing parity information.
Mirroring involves maintaining two or more separate copies of data. Strictly speaking, mirroring involves maintaining copies of the contents of an extent, either a real extent or a virtual extent. An extent is a set of consecutively addressed units of storage (e.g., bytes or words). The copies are maintained on an ongoing basis over a period of time. During that time, the data within the mirrored extent might change, but the mirroring relationship will be maintained such that the copies change correspondingly. When we say herein that data is being mirrored, it should be understood to mean that an extent containing data is being mirrored.
Typically, the copies are located on distinct disks that, for purposes of security or disaster recover, are sometimes remote from each other, in different areas of a building, different buildings, or different cities. Mirroring provides redundancy. If a device containing one copy, or a portion of a copy, suffers a failure of functionality (e.g., a mechanical or electrical problem), then that device can be serviced or removed while one or more of the other copies is used to provide storage and access to existing data. Mirroring can also be used to improve read performance. Given copies of data on drives A and B, then a read request can be satisfied by reading, in parallel, a portion of the data from A and a different portion of the data from B. Alternatively, a read request can be sent to both A and B. The request is satisfied from either A or B, whichever returns the required data first. If A returns the data first then the request to B can either be cancelled, or the request to B can be allowed to proceed, but the results will be ignored. Mirroring can be performed synchronously or asynchronously.
Striping involves splitting data into smaller pieces, called “stripes.” Logically sequential data stripes are written to separate storage devices, in a round-robin fashion. For example, suppose a file or dataset were regarded as consisting of 6 contiguous parts of equal size, numbered 1 to 6. Striping these across 3 drives would typically be implemented with parts 1 and 4 as stripes on the first drive; parts 2 and 5 as stripes on the second drive; and parts 3 and 6 as stripes on the third drive. Striping improves performance on conventional hard disks because data does not need to be written sequentially by a single drive, but instead can be written in parallel by several drives. In the example just described, stripes 1, 2, and 3 could be written in parallel. Striping can reduce reliability, however, because failure of any one of the devices holding stripes will render the data in that entire copy unrecoverable. To avoid this, striping and mirroring are often combined.
Writing of parity information is an alternative to mirroring for recovery of data upon failure. In parity redundancy, redundant data is typically calculated from several areas (e.g., 2, 4, or 8 different areas) of the storage system and then stored in one area of the storage system. The size of the redundant storage area is less than the remaining storage area used to store the original data.
A Redundant Array of Independent (or Inexpensive) Disks (RAID) describes several levels of storage architectures that employ the above techniques. For example, a RAID 0 architecture is a striped disk array that is configured without any redundancy. Since RAID 0 is not a redundant architecture, it is often omitted from a discussion of RAID systems. A RAID 1 architecture involves storage disks configured according to mirror redundancy. Original data is stored on one set of disks and duplicate copies of the data are maintained on separate disks. Conventionally, a RAID 1 configuration has an extent that fills all the disks involved in the mirroring. In practice, mirroring sometimes only utilizes a fraction of a disk, such as a single partition, with the remainder being used for other purposes. The RAID 2 through RAID 5 architectures each involve parity-type redundant storage. RAID 10 is simply a combination of RAID 0 (striping) and RAID 1 (mirroring). This RAID type allows a single array to be striped over two or more PDisks with the stripes also mirrored over two or more PDisks.
Concatenation involves combining two or more disks, or disk partitions, so that the combination behaves as if it were a single disk. Not explicitly part of the RAID levels, concatenation is a virtualization technique to increase storage capacity behind the VDisk facade.
Virtualization can be implemented in any of three storage system levels—in the hosts, in the storage devices, or in a network device operating as an intermediary between hosts and storage devices. Each of these approaches has pros and cons that are well known to practitioners of the art.
Various types of storage devices are used in current data processing systems. A typical system may include one or more large capacity tape units and/or disk drives (magnetic, optical, or semiconductor) connected to the systems through respective control units for storing data. High-speed, reliable data storage and file serving is a must for any large computing system. Virtualization, implemented in whole or in part as one or more RAIDs, is a preferred method of providing high-speed, reliable data storage and file serving.
A VDisk is usually represented to the host by the storage system as a logical unit number (LUN) or as a mass storage device. Typically, a VDisk is simply the logical combination of one or more RAIDs.
Because a VDisk emulates the behavior of a PDisk, virtualization can be done hierarchically. An example of this principle, VDisk mirroring is a critical component of virtualized storage systems. The concept is to create a separate RAID that is used to duplicate an existing RAID. As already described, mirroring allows data recovery and access via the mirrored system when a serious event disables the entire primary system, or even just the primary VDisk (or RAID) if all of the RAIDs are in the same system. In VDisk mirroring, the copies will have the same size but can otherwise have very different virtual configurations, such as different types of RAIDs. For example, a VDisk containing two 200 gigabyte (200G) RAID 5 arrays may be mirrored to a VDisk that contains one 400G RAID 10 array.
Solid state drives (SSDs), sometimes called solid state disks, are a major advance in storage system technology. An SSD is a data storage device that uses non-volatile memory such as flash, or volatile memory, such as SDRAM, to store data. The SSD can replace a conventional rotational media hard drive (RMD), which has spinning platters. There are a number of advantages of SSDs in comparison to traditional RMDs, including much faster read and write times, better mechanical reliability, much greater IO capacity, an extremely low latency, and zero seek time. A typical RMD may have an input/output (IO) capacity of 200 random IO operations per second, while a typical DRAM SSD may have an IO capacity of 20,000 random IOs per second. This speed improvement of nominally two orders of magnitude is offset, however, by a cost of SSD storage that, at today's prices, is roughly two orders of magnitude higher than RMD storage.
The invention encompasses any situation in which a device with a fast write speed, or input/output capacity for writes, is mirrored to a plurality of devices with lower write speeds. For example, a DRAM SSD is about 20% faster for reading than a Flash SSD, but may be 10 to 20 times faster for writing. Currently, the cost per gigabyte of DRAM SSD is roughly 16 times that of Flash SSD. Some embodiments of the invention include a DRAM SSD that is mirrored to a plurality of Flash SSDs, across which the data is striped. Even with technologies that do not exist today, the approach of the invention will allow discrepancies in write speeds between types of mirroring devices to be compensated for by striping across a plurality of slower devices.