A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes an operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system generally provides its storage services through the execution of software modules, such as processes. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attacked to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access information stored on the system. In this model, the storage system may be embodied as file server executing an operating system, such as the Microsoft® Windows™ operating system (hereinafter “Windows operating system”). Furthermore, the client may comprise an application executing on an operating system of a computer that “connects” to the server over a computer network, such as a point-to-point link, shared local area network, wide area network, or virtual private network implemented over a public network, such as the Internet. Each client may request the services of the server by issuing storage access protocol messages (in the form of packets) to the server over the network. By supporting a plurality of storage (e.g., file-based) access protocols, such as the conventional Common Internet File System (CIFS) and the Network File System (NFS) protocols, the utility of the server is enhanced.
To facilitate client access to the information stored on the server, the Windows operating system typically exports units of storage, e.g., (CIFS) shares. As used herein, a share is equivalent to a mount point or shared storage resource, such as a folder or directory that stores information about files or other directories served by the file server. A Windows client may access information in the directory by mounting the share and issuing a CIFS protocol access request that specifies a uniform naming convention (UNC) path to the share. The UNC path or pathname is an aspect of a Windows networking environment that defines a way for a client to refer to a unit of storage on a server. The UNC pathname is prefixed with the string \\ to indicate resource names on a network. For example, a UNC pathname may comprise a server name, a share (directory) name and a path descriptor that collectively reference a unit of storage or share. Thus, in order to access the share, the client typically requires knowledge of the specific physical location (i.e., the identity) of the server exporting the share.
Instead of requiring the client to provide the specific identity of the file server exporting the share, it is desirable to only require a logical pathname to the share. That is, it is desirable to provide the client with a globally unique pathname to the share (location) without reference to the file server. The conventional Distributed File System (DFS) namespace service provides such a solution in a Windows environment through the creation of a namespace that removes the specificity of server identity. DFS is well-known and described in DCE 1.2.2 DFS Administration Guide and Reference, 1997, which is hereby incorporated by reference. As used herein, a namespace is a view of shared storage resources (such as shares) from the perspective of a client. The DFS namespace service is generally implemented using one or more DFS servers and distributed components in a network.
Using the DFS service, it is possible to create a unique pathname (in the form of a UNC pathname) for a storage resource that a DFS server translates to an actual location of the resource (share) in the network. However, in addition to the DFS namespace provided by the Windows operating system, there are many other namespace services provided by various operating system platforms, including the NFS namespace provided by the conventional Unix® operating system. Each service constructs a namespace to facilitate management of information using a layer of indirection between a file server and cliff accessing a shared storage resource (share) on the server. For example, a share may be connected or “linked” to a link point (link in DFS terminology or a mount point in NFS terminology) to hide the machine specific reference to the share. By referencing the link point, the client can automatically access information on the storage resource of the specific machine. This allows an administrator to store the information on any server in the network by merely providing a reference to the information (or share). However, these namespaces are typically services created on heterogeneous server platforms, which leads to incompatibility and non-interoperability with respect to management of the namespaces by the user. For example, the DFS namespace service is generally limited to Windows-based operating system platforms, whereas the NFS namespace service is generally limited to Unix-based operating system platforms.
The Virtual File Manager (VFM™) developed by NuView, Inc. and available from Network Appliance, Inc., (“NetApp”) provides a namespace service that supports various protocols operating on various file server platforms, such as NetApp filers and DFS servers. The VFM namespace service is well-known and described in VFM™ (Virtual File Manager) Reference Guide, Version 4.0, 2001-2003, and VFM™ (Virtual File Manager) Getting Started Guide, Version 4.0, 2001-2003.
Movement or “migration” of data is an essential capability of any data management solution. Data migration may be employed for a number of reasons, including (i) load balancing to reduce the load on a particular machine, (ii) reducing access latency by moving data to a machine that is closer to a consumer of the data, or (iii) archiving to move data that has not been used for some time (“stale” data) on a machine of high grade to a machine of lower grade. Data migration thus facilitates improved distribution of storage in a hierarchical manner, as well as relocation of unwanted or stale data automatically.
Broadly stated, previous namespace services (such as the VFM namespace service) facilitate migration (movement) of data from a machine (computer) at a source location to a machine at a destination location using a migration agent in connection with a data migration process. As used herein, the migration agent is software code configured to perform data migration between the machines at the source and destination locations in a network. The migration agent used by these services is generally not pre-installed on the machine involved in the data migration process; rather, these services “push” installation of the migration agent to the machine in the network as and when required.
Often, the migration agent may fail as a result of, e.g., a system crash. The previous services may utilize platform specific tools to convey the cause of the failure or error to a user; such tools are generally complex and not useful in an environment wherein the machines have different (“heterogeneous”) operating system platforms, e.g., in a heterogeneous storage system environment. As used herein, a heterogeneous storage system environment may include storage systems having different operating systems, different variants of operating systems and/or different file systems implemented by different operating systems. The present invention is directed, in part, to conveying the cause of migration agent failure in a format that is user friendly and compatible in such a heterogeneous environment.
Furthermore, in response to the migration agent failure, it is possible that the resulting data stored at the destination location may be inconsistent (corrupted) with respect to the original data transferred from the source location. That is, the data stored at the destination location might include a mixture of the original data and additional erroneous data. The present invention is further directed, in part, to reducing the probability of a migration agent failure corrupting data during the migration process.
When the data migration process includes moving data (e.g., a file) between heterogeneous machines at the source and destination locations, there is a further issue of possible loss of data format of the file, as opposed to loss of the actual data content of the file. In this context, data loss denotes loss of file metadata, such as attributes (including security attributes such as access control lists, ACLs), type of file and other information associated with the file, such as alternate data streams (ADS). Here, the type of file includes (i) sparseness of the file and/or (ii) encryption of the file. Often there is a requirement to exactly (strictly) preserve the attribute, type and associated information of the file transferred from the source location to the destination location during the migration process. The present invention is further directed, in part, to a technique for strictly preserving file attributes, type and associated information during data migration.