1. Technical Field
The present invention relates in general to data processing systems and in particular to disk storage management in data processing systems. Still more particularly, the present invention relates to volume group management of reserves in disk storage systems.
2. Description of the Related Art
Availability of a data processing system has become a significant issue for many companies. Computerized applications have become more critical to operations and the need for reliability of the system has never been greater. Data processing systems have a direct effect on revenue, staff productivity, service level and costs associated with alternatives to the data processing system in case of failure.
Continuous availability is the highest level of availability. The data processing system never fails to deliver its service and attempt to provide 100% availability to the end user by providing.redundancy in components and the ability to change processes on-line. Planned outages may occur and a machine within the data processing system may fail, but they should not be apparent to the end user.
A data processing system using a UNIX operating system, may utilize a logical volume management system to manage access to storage devices, in particular disk drives. The logical volume management application (xe2x80x9cLVMxe2x80x9d) is part of the kernel in the operating system and starts up when the system does. The objective of a LVM is to manage storage media as opposed to management of the storage device. A LVM uses a hierarchy of structures to manage fixed disk storage. Each individual fixed-disk drive, called a physical volume (xe2x80x9cPVxe2x80x9d) has a name, such as /dev/hdisk0. Every PV in use belongs to a volume group (xe2x80x9cVGxe2x80x9d). All of the physical volumes (hard disk drives) in a volume group are divided into physical partitions (xe2x80x9cPPxe2x80x9d) of the same size. The number of physical partitions on each disk varies, depending on the total capacity of the disk drive.
Within each volume group, one or more logical volumes are defined. Logical volumes (xe2x80x9cLVxe2x80x9d) are groups of information located on physical volumes. Data on logical volumes appears to be contiguous to the user but may be non-contiguous on the physical volume. This allows file systems, paging space and other data logical volumes to be re-sized or relocated. Additionally, the logical volumes may span multiple physical volumes, and have their contents replicated for greater flexibility and availability in the storage of data.
Each logical volume consists of one or more logical partitions (xe2x80x9cLPxe2x80x9d). Each LP corresponds to at least one physical partition. Of all the components used in a computer system, physical disk drives are usually the most susceptible to failure. Because of this, mirroring is a frequently used technique for increasing system availability. A filesystem can easily be mirrored when using the Logical Volume manager by mirroring the logical volume on which the filesystem is created. If mirroring is specified for the logical volume, additional physical partitions are allocated to store the additional copies of each logical partition. Although the logical partitions are numbered consecutively, the underlying physical partitions are not necessarily consecutive or contiguous. Logical volumes may serve a number of system purposes such as paging, storing raw data or holding a single filesystem.
For LVs having mirrored copies each partition of the mirror can have two states; available and stale. Data may be read from any available mirrored partition. Data must be written to all available mirrored partitions before returning. Only partitions that are marked as available will be read and written. A command must be run that will copy information from an available mirror to the stale mirror and then changes a partition marked as stale to available.
Character Input/Output (xe2x80x9cI/Oxe2x80x9d) requests are performed by issuing a read or write request on a /dev/rlv[N] character special file for a logical volume. The read or write is processed by the file system Supervisor Call (SVC) handler which calls a logical volume device driver (xe2x80x9cLVDDxe2x80x9d) ddread or ddwrite entry point. The LVDD is a pseudo-device driver that operates on logical volumes through a special file, such as /dev/lv[n]. Similar to a physical disk device driver, this pseudo-device driver provides character and block entry points with compatible arguments. Each volume has an entry in the kernel device switch table. Each entry contains entry points for the device driver and a pointer to the volume group data structure.
The read or write (expressed xe2x80x9cddreadxe2x80x9d and xe2x80x9cddwritexe2x80x9d) entry point transforms the character request into a block request by building a buffer for the request and calling the LVDD ddstrategy entry point. Block I/O requests are performed by issuing a read or write on a block special file /dev/lv[n] for a logical volume. The LVDD ddstrategy entry point translates the logical address to a physical address and calls the appropriate physical disk device driver.
On completion of the I/O, the physical device driver calls the iodone kernel service on the device interrupt level. This service then calls the LVDD I/O completion-handling routine. The LVDD then calls the iodone service to notify the requester that the I/O is complete.
A single initiator system enables only one machine at a time to access the storage device. The system may have multiple machines on the same bus (in this instance a bus) that may access the storage device. The LVM sets up a reserve when the system boots up. The reserve blocks access to a specified volume in the storage device. The LVM protects the disks from improper access when the reserve is activated.
A system utilizing more than one machine has a designated primary machine and the rest are secondary. If a primary machine fails, secondary machine(s) activate and take over the data processing function of the primary machine. The reserves have to be made available to the secondary machines on a periodic basis to maintain currency so switching machines is not inordinately delayed in case of failure.
An example of a system having multiple machines sharing one storage source follows. A data processing system with two machines, node_A and node_B, on a bus is serially sharing a storage device. Node_A is up and has ownership of the volume group. This means that node_A has a reserve on the volume group and machine node_B is unable to access the reserve until node_A gives up its reserve. If node_A crashes (failover), node_B takes the volume group, starts it up (vary on) and, if node_B is current, begins processing the data so the application can continue.
If Meta-data in the volume group has changed while in reserve to node_A, node_B must export and re-import the volume group before beginning processing because the data has changed with respect to node_B""s record of meta-data before being locked out by the LVM reserve. If there has been no scheduled failover to update node_B, the time to update can be extensive. Also, when node_A recovers and takes control of the volume group, node_A must determine the changes made while node13 A was downxe2x80x94a repeat of the previously discussed process.
In FIG. 3, a flow diagram of a current method of volume management in multiple node, non-concurrent use storage devices, is depicted. The process begins in step 302 which depicts an occurrence causing the data processing (not shown) to change nodes. This could be a scheduled downtime or node_A could have failed. The process proceeds to step 304 which illustrates the logical volume manager shutting down the volume group owned by node_A. At this point, node_A is no longer accessing the logical volume and is prevented from doing so as shown in step 306, which depicts the logical volume manager locking node_A out of the volume group. The process next proceeds to step 308, which illustrates the logical volume manager opening a volume group on the back up machine, node_B. The process then passes to step 310, which depicts node_B, gaining access to the storage device and beginning to refresh its meta-data in its volume group.
The process next proceeds to step 312, which illustrates the completion of the refreshment of the meta-data of node_B and the logical volume manager closing the volume group to node_B. The process continues to step 314, which depicts the logical volume manager cycling node_A back up, re-opening the storage device and opening the volume group to node_A. The process next passes to step 316, which illustrates node_A refreshing its meta-data, which is out of date due to the downtime while node_B had the volume group.
An alternate method for maintaining currency is to eliminate the reserve and leave the storage device available at all times to all machines on the bus. No machines are locked out, so meta-data changes by the primary machine, node_A in this case, are available for all machines to access. However, the operator has a less secure position because any machine could lock the disks and access to the storage device and subsequently, data access is lost.
It would be desirable, therefore, to provide a method for allowing logical volume managers to give up or gain the reserve.without cycling the volume group and user application up and down, while maintaining security of the volume group.
It is therefore one object of the present invention to provide a process that will allow multi-initiator data processing systems to access meta-data that is reserved without deactivating the storage device.
It is another object of the present invention to provide a process that will allow multiple nodes access to a reserved volume group without shutting down the original owner of the volume group.
It is yet another object of the present invention to provide a data processing system, with primary and secondary nodes, a process that allows all the nodes to refresh each node""s version of meta-data without shutting down the primary node.
The foregoing objects are achieved as is now described. In a multiple machine data processing system, a volume group may be accessed by other than the original owner of the volume group and maintain integrity of the volume group. A logical volume manager on the primary machine holds all incoming I/O requests to the logical volumes in the volume group and waits for all the I/O requests already sent down to the disks in the volume group. Once all the outstanding I/O requests to the disk have been completed, the disks in the volume group are closed and reopened without reserve. The logical volume manager on a secondary machine opens the disks in the volume group without taking a reserve to allow meta-data to be refreshed on the secondary machine. When the secondary machine is finished the disks are closed. The logical volume manager on the primary machine holds all incoming I/O requests to the logical volumes in the volume group and waits for all the I/O requests already sent down to the disks in the volume group. Once all the outstanding I/O requests to the disks have been completed, the disks in the volume group are closed and reopened with reserve. The application on the primary machine does not see any of the operations and is unaffected.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.