1. Field of the Invention
The present invention relates to storage operations and management in a computer system. More particularly, the present invention relates to a method and system for relocating files that are partially stored in remote storage.
2. Brief Description of Related Developments
Since its advent, the model of a standalone personal computer with removable storage media has had a great effect on the computer industry and influenced the design of much of computer system architectures and infrastructures. However, advances in storage solutions and complex computer systems have been happening rapidly since the time of the first standalone computers. For example, continued discovery of smaller and smaller integrated circuits and semiconductor chips capable of storing ever increasing quantities of data, increased bandwidth and data transfer rates possible with today""s computer networks and the concomitant increased utilization of server computers in connection with databases and storage components of all types are all examples of the increased functionality that networked computer environments have evolved to possess.
As a consequence, traditional computing and storage techniques and models have been challenged. The widespread use of removable storage media, for example, has been challenged by the ability to remotely store files efficiently and inexpensively. Furthermore, as computer systems have evolved, so has the availability and configuration of data storage devices, such as magnetic or optical disks. For example, these storage devices can be connected to the computer system via a bus, or they can be connected to the computer system via a wired or wireless network. In addition, the storage devices can be separate or co-located in a single cabinet.
As background, a storage volume is a software abstraction of the underlying storage devices and is the smallest self-contained unit of storage mounted by an operating system and administered by the file system. Storage volumes abstract the physical topology of their associated storage devices and may be a fraction of a disk, a whole disk or even multiple disks that are bound into a virtually contiguous range of logical blocks. This binding may increase the fault tolerance, performance, or capacity characteristics of the underlying devices. In short, in today""s complex computer system environments, storage volumes can be a diverse set of elements for which efficient and effective management is desirable.
Volumes are constructed from one or more extents that are contiguous storage address spaces presented by the underlying storage devices. An extent is typically characterized by the size of the address space and a starting offset for the address space from a base of the media. Volume mapping is the process of mapping contiguous address space presented by the volume onto the non-contiguous storage address spaces of the underlying extents. Volume mappings are either implemented on a specialized hardware controller, referred to as a hardware volume provider, or in software by a software volume provider. By way of further background, a technique for common administration and management of volume providers is provided in commonly assigned application Ser. No. 09/449,577, entitled xe2x80x9cAdministration of RAID Storage Volumesxe2x80x9d now U.S. Pat. No. 6,081,310.
Advances in storage techniques are changing the ways in which data can be stored, thereby placing a strain on the traditional management of files within and between volumes. Thus, advances in networks and computer system models have greater ramifications than simply resulting change in the types of storage components being utilized and in the connections being used between the storage components. Techniques traditionally used to manage file transfers, for example, were not originally designed to support all of the increased functionality of today""s complex networked environments. Operating systems, system infrastructure and core file management functions with which many computers operate have thus been affected. As a consequence, current file systems have lingering inefficiency associated therewith and are not equipped to handle all different types of storage operations with maximum efficiency.
One such inefficiency exists in connection with the hierarchical storage management (HSM) system, the system that oversees the storage of files and operations incident thereto. With the proliferation of various storage elements and techniques as described above, sometimes it becomes desirable to store portion(s) of a file in remote storage while retaining portion(s) in local storage. This may be desirable, for example, to free up more valuable local storage when portions of a file are known to be static, or to stow away certain data that is infrequently utilized. For another example, an append only file has the characteristic that data writes occur only at the end of the file. For yet another example, migration of data to remote storage might be an effective way of providing on-line disk/memory allocation limits. Consequently, an efficient use of local storage may dictate that the immutable portions of the file, to which new writes are appended, be migrated to remote storage. Thus, there are a variety of reasons why a file may have some data that is suited to migration to remote storage.
The case for partial migration of files is not generally supported in current HSM systems for relocation operations and the like. For example, a file copied from one storage location to another storage location, such as from on-line storage to remote storage, generally involves copying or moving the entire file. Current HSM systems perform a file move by recalling the entire file, copying it either to a second server managed by a second HSM system or to a second volume managed by the same HSM system, and registering the target volume for administration by either the second or same HSM system, respectively. Thus, current HSM systems do not perform move operations without changing on-disk allocations.
Commonly assigned copending U.S. patent application Ser. No. 09/644,667 entitled xe2x80x9cPartial Migration of an Object to Another Storage Location in a Computer System,xe2x80x9d filed on the same day as the present application, relates to a HSM system that does support partial migration of data streams/files. In that system, metadata is generated for the description of a file having at least one portion migrated. Via the metadata, the HSM system can recall the file data since the metadata contains information relating to where each portion of the file is stored. A need still exists, however, for efficient relocation techniques in connection with such a system. A system could be implemented for a file or other data stream, stored partially in a base storage location and stored partially in another storage location as a result of partial migration techniques, such that the HSM system, used incident to the file migration, would cause relocation of the entire file, i.e., both the portion remotely stored and the portion stored on the base volume. For example, the HSM system could cause the remotely stored or migrated portion to be read back from remote storage and then the entire file once re-assembled in the base volume could be relocated according to standard relocation techniques.
Thus, the current state of the art of hierarchical storage management of files inadequately or inefficiently covers the case where files, to be copied, moved or re-located in some respect, have been partially migrated to another storage location. Further, the state of the art provides inadequate and/or inefficient support for systems administering the migration of predetermined part(s) of files/streams from one storage location to another while retaining other part(s) of files/streams. The invention of U.S. application Ser. No. 09/644,667 referenced above, describes a mechanism for specifying those regions of a data stream suited to writes and updates and those immutable or other regions of a data stream suited to off-line or remote storage. In so doing, a method of generating metadata for describing a stream""s storage relationships is provided. However, a need still exists for updating the metadata in the event of a relocation operation. Consequently, issues arising in connection with the partial migration of files are becoming the subject of much current research and development. Thus, in current systems where the file server performs re-location operation(s) in connection with a file that is partially stored in remote storage, a common approach does not exist to relocate, move or copy files from one volume to another.
Thus, assuming a file management system that can describe, define or specify when a file has been partially stored in remote storage, it would be advantageous to update such description, definition or specification to reflect efficient relocation operations. Alternatively stated, assuming the existence of a file server for a computer system capable of identifying and specifying via metadata when a file has portion(s) that have been migrated to remote storage, it would be advantageous to perform efficient relocation operations and to update the metadata in accordance with the same. It would be further advantageous to be able to move a stream/file independent of its on-disk allocations. It would be further advantageous to move metadata used to manage a partially migrated file to a new location. It would be still further advantageous to allow efficient access to migrated data of a partially migrated file relocated to a new storage location. It would be still further advantageous to provide a HSM system capable of efficiently moving, copying or relocating files that have been partially migrated to remote storage from one volume to another, even where multiple back end servers are involved.
The present invention relates to hierarchical storage management (HSM) systems used in connection with computer systems. A technique is provided whereby a file having portion(s) migrated to remote storage location(s) may be efficiently relocated, and metadata for the file is updated according to its relocated storage relationships. Thus, when a source file having portions migrated to remote storage is to be re-located or copied by the HSM system to a target file, instead of copying the entire file across all of its associated storage locations, the minimum or efficient set of data is relocated. The metadata describing the source file""s migration storage characteristics is updated to reflect its new use in connection with the target file.
Other features of the present invention are described below.