The present disclosure relates in general to the field of computer networks and, more particularly, to a system and method for the backup and recovery of data in a multi-computer environment.
Computer networking environments such as Local Area Networks (LANs) and Wide Area Networks (WANs) permit many users, often at remote locations, to share communication, data, and resources. A storage area network (SAN) may be used to provide centralized data sharing, data backup, and storage management in these networked computer environments. This combination of a LAN or WAN with a SAN may be referred to as a shared storage network. A storage area network is a high-speed subnetwork of shared storage devices. A storage device is any device that principally contains a single disk or multiple disks for storing data for a computer system or computer network. The collection of storage devices is sometimes referred to as a storage pool. The storage devices in a SAN can be collocated, which allows for easier maintenance and easier expandability of the storage pool. The network architecture of most SANs is such that all of the storage devices in the storage pool are available to all the servers on the LAN or WAN that is coupled to the SAN. Additional storage devices can be easily added to the storage pool, and these new storage devices will also be accessible from any server in the larger network.
In a computer network that includes a SAN, the server can act as a pathway or transfer agent between the end user and the stored data. Because much of the stored data of the computer network resides in the SAN, rather than in the servers of the network, the processing power of the servers can be used for applications. Network servers can access a SAN using the Fibre Channel protocol, taking advantage of the ability of a Fibre Channel fabric to serve as a common physical layer for the transport of multiple upper layer protocols, such as SCSI, IP, and HIPPI, among other examples.
The storage devices in a SAN may be structured in a RAID configuration. When a system administrator configures a shared data storage pool into a SAN, each storage device may be grouped together into one or more RAID volumes and each volume is assigned a SCSI logical unit number (LUN) address. If the storage devices are not grouped into RAID volumes, each storage device will typically be assigned its own LUN. The system administrator or the operating system for the network will assign a volume or storage device and its corresponding LUN to each server of the computer network. Each server will then have, from a memory management standpoint, logical ownership of a particular LUN and will store the data generated from that server in the volume or storage device corresponding to the LUN owned by the server.
When a server is initialized, the operating system assigns all visible storage devices to the server. For example, if a particular server detects several LUNs upon initialization, the operating system of that server will assume that each LUN is available for use by the server. Thus, if multiple servers are attached to a shared data storage pool, each server can detect each LUN on the entire shared storage pool and will assume that it owns for storage purposes each LUN and the associated volume or storage device. Each server can then store the user data associated with that server in any volume or storage device in the shared data storage pool. Difficulties occur, however, when two or more servers attempt to write to the same LUN at the same time. If two or more servers access the same LUN at the same time, the data stored in the volume or storage device associated with that LUN will be corrupted. The disk drivers and file system drivers of each server write a data storage signature on the storage device accessed by the server to record information about how data is stored on the storage system. A server must be able to read this signature in order to access the previously written data on the storage device. If multiple servers attempt to write signatures to the same storage device, the data storage signatures will conflict with each other. As a result, none of the servers will be able to access the data stored in the storage device because the storage device no longer has a valid data storage signature. The data on the storage device is now corrupted and unusable.
To avoid the problem of data corruption that results from access conflicts, conventional storage consolidation software employs LUN masking software. LUN masking software runs on each server and masks the LUNs in order to prevent the operating system from automatically assigning the LUNs. In effect, LUN masking software masks or hides a device from a server. The system administrator may then use the storage consolidation software to assign LUNs to each server as needed. Because a server can access only those devices that it sees on the network, no access conflicts can arise if each LUN is masked to all but one server.
As storage available to a computer network increases, the need for adequate backup storage also increases. Often a computer network employs the use of dedicated backup storage devices, such as tape storage devices. Storing data on tapes is considerably cheaper than storing data on disks. Tapes also have large storage capacities, ranging from a few hundred kilobytes to several gigabytes. Because tapes are sequential-access media, accessing data on-tapes is much slower than accessing data on disks. As a result, tape storage devices are more appropriate for long-term storage and backup while disk drives are more appropriate for storing data to be used on a regular basis (such as a storage device for a SAN).
During backup operations, some or all of the storage devices available to the network transmit all or a portion of stored data to the dedicated backup storage devices. Backup operations are implemented to safeguard computer systems against disasters or other events that result in data loss. In the event of a disaster, data may be recovered from the dedicated backup storage devices. Examples of disasters that are caused by hardware failures include memory errors, system timing problems, resource conflicts, and power loss. Disasters may also be caused by software failure, file system corruption, accidental deletion, computer virus infection, theft, sabotage, or even natural disasters. One of the most common disasters occurs when a server on the LAN or WAN experiences a software failure or crash or suffers some other serious failure that causes the server to stop working or abort an application unexpectedly. Regardless of the cause of the disaster, user data may be lost. To restore the affected server to its previous state, the system administrator or user must copy the backup data to the affected server.
During the recovery process, backup data must be read from the dedicated backup storage devices on the storage network. As discussed above, a server normally runs LUN masking software to prevent the server from seeing and interfering with storage devices on the SAN that the server does not have the right to use because such interference can cause data corruption. But after a disaster, an affected server may no longer be running LUN masking software. Unfortunately, this creates a xe2x80x9ccatch-22xe2x80x9d situation in the recovery of backup data. The LUN masking software must be recovered from the dedicated backup storage device on the storage network, yet the LUN masking software must already be running on the affected server in order for the affected server to safely interact with the storage network.
To prevent the affected server from accessing storage devices that are already claimed by another server and subsequently corrupting the data stored on those storage devices, system administrators frequently follow the steps of disconnecting the affected server from the fabric and connecting it to its associated dedicated backup storage device. Only then can the system administrator initiate the recovery process and restore the affected server. This process presents several disadvantages. First, due to the operating environments of SANs and computer networks, the server and the dedicated backup storage device are often located a significant distance from each other. Depending on the network, this distance may range from a few feet to several kilometers. The server administrator must make arrangements for physically moving one component to the other, connecting them for the backup recovery procedure, and then physically move them back to their respective physical locations. A second disadvantage of this process occurs when the dedicated backup storage device is disconnected from the storage network. In this state, the backup storage device is unavailable to provide backup services for the other servers on the storage network during the disaster recovery process. If a second disaster occurs, the disconnected dedicated backup storage device will not be able to completely restore the other servers. Alternatively, the system administrator or user may leave the affected server and its associated dedicated backup storage device connected to the storage network and attempt to use the host bus adapter (HBA) driver to manually set the LUN access. However, this process does not provide any tolerance for operator error, and data corruption may result if LUN access is improperly granted to the affected server.
In accordance with teachings of the present disclosure, a system and method for recovering backup data from dedicated backup storage devices in a multi-computer environment are disclosed that provide significant advantages over prior developed systems.
The system and method described herein include a LUN masking driver. The LUN masking driver is preferably contained on an emergency diskette that is to be used during the recovery process for loading vital device drivers onto the affected server so that the affected server may boot and connect to the SAN. During the recovery process, the LUN masking driver will load when the operating system boots up, after the SAN HBA driver loads and before the normal file systems load. The LUN masking driver scans all devices visible on the SAN and uses SCSI inquiry commands to determine which devices are dedicated backup storage devices. The LUN masking driver then masks all devices that are not dedicated backup storage devices. Thus, only dedicated backup storage devices are visible to software that boots up after the LUN masking driver completes its function. Consequently, the operating system""s file systems never see the storage devices that are not dedicated backup storage devices. As a result, the affected server cannot access the storage devices and cause data corruption.
The present disclosure also describes a method for restoring backup data from a dedicated backup storage device to a server on a computer network. The method includes the step of loading the LUN masking driver of the present disclosure during the recovery process. The LUN masking driver loads before the operating system boots up and masks the storage devices from the operating system. As a result, data may be recovered from the dedicated backup storage devices without the risk of the server accessing other storage devices and corrupting the data stored therein.
The disclosed system and method provide several technical advantages over conventional approaches for recovering backup data in a storage network environment. One advantage provided by the disclosed system and method is that an affected server may recover backup data from a dedicated backup storage device without the need for disconnecting the dedicated backup storage device and the affected server from the network. As a result, the dedicated backup storage device may remain connected to the network and continue its backup operations. The disclosed system and method is also advantageous in that it reduces the time and resources necessary to perform recovery operations because the backup operations do not require separating the affected server and the dedicated backup storage device from the network. Other technical advantages should be apparent to one of ordinary skill in the art in view of the specification, claims, and drawings.