The present invention generally relates to data processing in the field of networking. The invention relates more specifically to an approach for backup and restore of a data storage device that is carried out without involvement of a server that uses the data storage device.
Computer data storage devices are widely used to store valuable data that is expensive to compile and essential to have on-line for the operation of business processes. However, because data storage devices can fail, periodic data backup is an essential operation to ensure that data can be recovered from a backup storage device in the event of failure of a primary storage device.
In one past approach to conventional data backup, a server that uses the storage device for data storage periodically executes a backup service or program. The server implements a file system to organize data on the storage device. The backup service queries the file system of the server to determine what files are located on the storage device, and copies the files to a backup data storage device. In one related approach, the backup service executes on a second host that queries the file system of the first server and copies data to a backup storage device that is managed by the second host. In yet another approach, the backup service performs a track-by-track backup of the data storage device, without querying the file system. However, these approaches require the backup service to know what file system or format is used to record information on the data storage device, which typically requires knowledge of the operating system that was used to control the recording of data on the data storage device. In a third approach, the server that contains the file system performs the backup itself. A fourth approach uses an agent on the server to transport the data to the backup server.
Although these past approaches are workable in some contexts, in other contexts they are impractical. One specific context in which these past approaches are inadequate is the instant data center or extensible computer system. Instant data centers are constructed using methods and systems that provide a flexible, extensible way to rapidly create and deploy complex computer systems and data centers that include a plurality of servers, one or more load balancers, firewalls, and other network elements. One method for creating such a system is described in co-pending application Ser. No. 09/502,170, filed Feb. 11, 2000, entitled xe2x80x9cExtensible Computing System,xe2x80x9d naming Ashar Aziz et al. as inventors, the entire disclosure of which is hereby incorporated by reference as if fully set forth herein (referred to herein as xe2x80x9cExtensible Computing System Descriptionxe2x80x9d).
The Extensible Computing System Description discloses a method and apparatus for selecting, from within a large, extensible computing framework, elements for configuring a particular computer system. Accordingly, upon demand, a virtual server farm or other data center may be created, configured and brought on-line to carry out useful work, all over a global computer network, virtually instantaneously.
A characteristic of the approach for instantiating, using, and releasing virtual server farms disclosed in the Extensible Computing System Description is that a particular storage device may be used, at one particular time, with a first operating system or file system, and later used with a completely different second operating system and a file system. Thus, a backup service that provides backup for a particular storage device cannot assume that any particular operating system, file system, file format or recording format is then currently in use at the time of a backup. Moreover, in the context of an instant data center, one storage device may potentially be used to successively store private, confidential data of two unrelated enterprises. As a result, the backup service cannot assume that a particular storage device is storing any particular kind of data.
Based on the foregoing, there is a clear need in this field for a backup approach that does not require knowledge of the contents of the storage device, the kind of data that is stored, the file system that has been used to record data on the storage device, or the operating system that was used to control the storage device.
Another characteristic of the instant data center approach is that the servers in the data center that use the data storage devices needing backup typically have no regularly scheduled downtime, or period of known inactivity or reduced activity, in which a backup service can properly query the server or its file system.
Still another characteristic is that a storage device may be associated with different kinds of servers from time to time. For example, a particular data storage device could be associated with a set of UNIX servers over a first period of time, and then be reallocated and assigned to a set of Windows 2000 servers at a second period of time. There is a need for a backup approach that is compatible with storage devices that are re-assigned in this manner. Further, the overall configuration or topology of a particular instant data center may change from time to time in terms of number of servers, number of storage devices, and their arrangement.
Thus, there is a need for a data backup approach that does not require use of a server associated with a storage device in order to carry out backup. More specifically, there is a need for a data backup approach that is transparent or invisible from the perspective of the server that is using the data storage device that is backed up. However, there is still a need to provide notification to the server that it is about to be backed up.
Still another characteristic of the instant data center is that a fabric of network switching devices, such as VLAN switches and SAN switches, are used to logically and physically interconnect various servers and storage devices into instant data centers. Routing network traffic associated with data backup through the switching fabric may over-burden the switching fabric. Thus, there is a need for a backup approach that is carried out without communicating data that is backed up through the switching fabric.
A data restoration approach that addresses the foregoing problems is also needed. In particular, there is a need for a way to carry out data restore operations without knowledge of the structure or content of the data that is restored and without knowledge of the nature, structure or organization of the storage device that is a target of data restoration.
The foregoing needs, and other needs that will become apparent from the following description, are achieved by the present invention, which comprises, in one aspect, a method of storing a backup copy of computer data. One or more datasets of a computer data storage device that participates in a dynamically changing virtual server farm are backed up without involving or affecting operation of servers in the virtual server farm that use the data storage device, and without receiving information about the structure or content of data in the datasets, the topology of the virtual server farm, or the type of server, file system, or operating system in use by the servers. A restore operation provides restored data on an address that is linearly related to and separated from a backup address. Data can be restored to a storage device without interfering with operation of the servers that use the data and without regard to structure or content of the data. Data can be backed up and restored in tracks, volumes, or other physical or logical units.
In one specific embodiment, a method of storing a backup copy of computer data involves first receiving a request to back up data associated with a dynamically changing networked computer system that comprises a data storage device and one or more servers. The computer system managing the server is told to quiesce (that is, make no more changes). It returns the current configuration to the backup system. Each server in the computer system in that configuration is requested to quiesce. A backup of the one or more tracks of the data storage device is then initiated, without involvement of the servers in the computer system that use the data storage device and without regard to structure or content of data on the tracks.
In another aspect, a method of restoring data is provided. A request to restore data associated with a host in the virtual server farm is received. The request identifies the host and a first address. Data associated with the host in a backup mass storage device is located. The data associated with the host is made available from the mass storage device at a second address that is linearly related to the first address. In one feature, the second address is determined by the relation: ((total address space of storage system)/2)+1. In another feature, in the context of SCSI storage systems, the first address is a first SCSI address, and wherein the second address is a second SCSI address having a value equal to a sum of the first SCSI address and the integer value 8.
Other aspects encompass an apparatus and a computer-readable medium that are configured to carry out the foregoing steps.