1. Field of the Invention
Embodiments of the present invention relate to file access within a managed computer environment having diverse networks, computing devices and storage devices.
2. Related Art
In a managed computing environment, there are a large number of issues and trade-offs to consider when setting up a file processing application (FPA) for offloaded client file access. The trade-offs between synchronized file system access versus snapshot access, along with the dependences introduced by proxy-based snapshot access methods, usually result in highly specialized configurations that unnecessarily tie file processing applications to storage infrastructure elements.
In order for a centralized file processing application to remotely access a file system of a disk set, a data path between the application and the disk set must be established, and a valid storage stack must be constructed along that path in order to expose the file system. The stack is usually constructed by logical volume managers (LVMs) and file system drivers that understand the formats used by the file system and the volume it is mapped onto.
There are many difficulties with the offloaded file access methods of the prior art, as illustrated in FIG. 1. For instance, when a file processing application 14 wishes to access the file system of a disk set belonging to a client computer, it must decide up-front whether or not it requires access to the most current version of the file system. A network file system connection like the one between the FPA's computer 12 and the client 20 is the most common way for the two computers to safely share a file system residing on disk set 26. The network file system driver 16 and the network file server 22 ensure that local access by programs on the client to the file system exposed by storage stack 24 is synchronized with remote access by programs running on computer 12.
However, synchronized network access to a client's file systems suffers from several limitations. The client computer must be active; a computer cannot export a network file system if it is powered off. If, for instance, FPA 14 wishes to access the most current version of disk set 36 of client 28 using a network file system, it cannot if the client is inactive. Also, a connection cannot be made if the computer hosting the FPA and the client computer have incompatible network file system components, in other words, a server component and a driver component that do not use the same file sharing protocol. Transferring data over a general purpose network can increase network congestion and impact the responsiveness of other computers using the network.
Frequent access to a client's network file system can also impact the client's performance, since the client's network file system has to consume disk and processor (CPU) resources to service file requests. The requester may not be able to access a file locked by a program running on the client, called the “busy files” problem. This is an issue if the requester is a backup application, in which case locked files will be skipped by the backup process.
Accessing a snapshot of the disk set, instead of the disk set itself, solves the busy file problem. For instance, to access disk set 34 of client 30, the FPA can request snapshot manager 54 (which could be a systems or storage management application) to create a snapshot 46 from disk set 34 and to mount the snapshot on proxy computer 40, so that the FPA can access the disk set through a network file system exported by the proxy's network file system server 42. Since the client is not accessing the snapshot, the FPA has exclusive access to the snapshot's file systems. Snapshot access can also improve performance by bypassing the general-purpose IP network. It is possible, for example, to install a FPA 58 directly on a proxy 56 to access snapshot 64; this would give FPA local disk access via the high-speed storage bus, eliminating the need to use the general-purpose IP network and the network file system server 60.
Unfortunately, using snapshots introduces new problems. First, for the above example to be valid, the proxy must be able to construct a storage stack 44 from the snapshot 46, in other words it must have LVMs, file system drivers, and programs capable of decoding the formats used by the client's volumes, file systems, and files. Because of the proprietary nature of some volume, file system, and file formats, commercial-quality LVMs, file system drivers, and programs tend to be available on (and bundled with) a limited set of operating system types. In practice, this means that the proxy computer used to mount the snapshot of a client computer generally has to run an operating system and software stack that is similar or identical to the one used by the client. This constraint creates rigid dependences between clients, the proxies compatible with those clients, snapshot managers, and file processing applications.
To illustrate those dependencies, assume that clients 30 and 48 use different operating systems, volume formats, and file system formats. Proxy 40 is compatible with the OS and formats of client 30 and can thus be used to process its disk set snapshots, however it is incapable of processing snapshots of client 48's disk set. Similarly, if proxy 56 is compatible with client 48, it is likely to be incompatible with client 30.
Providing access to a variety of snapshots, each using different volume and file system formats, therefore generally involves setting up multiple proxy computers, each with a different operating system and a different set of LVMs and file system drivers. Not only does this configuration increase hardware costs, it also increases setup and management complexity. The incompatibilities above also complicate the design and configuration of file processing applications. In order to access the snapshot of a particular client, a FPA needs to know beforehand what compatible proxy computers to mount a network file system from. The incompatibilities thus create a rigid dependence between FPAs and proxies. This dependence can become even more rigid when a FPA is configured exclusively for local file system access to improve performance. Recall that FPA 58 can be installed directly on a proxy 56 to bypass the network and gain local access to snapshot 64. Such a set up can reduce flexibility even more, since the FPA cannot locally access the file systems of clients dissimilar to client 48.
When an FPA relies on a particular proxy for accessing snapshots, the proxy becomes a potential performance bottleneck limiting scalability. During file access, the proxy has to provide CPU cycles for running the network file system server used by the FPA if it is remote, or for running the FPA itself it is locally installed. This limits the number of simultaneous snapshots and file systems that a FPA can process in parallel at a given time. The FPA can be configured to access one file system at a time to avoid concurrency issues, but this would increase the total amount of time needed to process a potentially large number of file systems belonging to multiple clients. The situation is aggravated when the proxy is used by multiple FPAs.
A final drawback of snapshot access is caused by the definition of snapshots themselves. If a FPA accesses a client's file systems through a snapshot of the client's disks, it will be incapable of making any modifications to the current version of client's file systems, since the snapshot represents a past version. Any changes that the FPA makes to the snapshot will not be visible nor available to the running client.
Virtual machines whose virtual disks are mapped to files instead of physical disks introduce additional issues for offloaded file processing applications. Synchronized network file system access to an active virtual client is not a problem, since the software stack of a running virtual machine operates identically to that of a physical machine, and can thus provide a storage stack and network file system server for the virtual disks' file systems. Accessing the file systems of a powered-off virtual machine or a virtual machine snapshot is more difficult, since an additional storage stack is needed to access the file systems that contain the virtual disks. For this reason, existing systems and storage management software generally offer no or limited file access support for inactive, file-based virtual machines.
This problem is compounded by the fact that computing environments that include virtual machines tend to have large numbers of powered-off virtual machines. There are two reasons for this. Unlike real machines, file-based VMs are easy to duplicate because they are just files, and this can lead to large numbers of VMs. On the other hand, running a VM consumes a hosts' CPU and memory resources, potentially impacting other VMs running on the same host, so VMs that are not immediately needed in daily business operations tend to be left powered off.
In summary, it is difficult for offloaded file processing applications that provide protection and prevention, such as file backup and virus scanning applications, to access the files of powered-off VMs. Since they cannot be checked and protected around the clock, inactive VMs can pose a security threat or data protection problem to the IT administrator.
A given file processing application is often bound to a specific access method, e.g., synchronized network file system access versus snapshot access. Once the access method is selected, the File Processing Application (FPA) has to operate with the limitations of that method, such as the busy files problem and the inability to access powered-off clients in case of synchronized network access.
When an FPA is configured to access snapshots, it is usually bound to a specific proxy computer through a network file system connection or by running directly on the proxy. A proxy computer's limited ability to decode diverse volume and file system formats restricts the FPA's ability to process the files of client disk sets not compatible with the proxy.
Despite virtual machines becoming an increasingly important component of modern computing environments, existing systems and storage management infrastructures provide limited offloaded file access ability to virtual disks. In particular, it is difficult or impossible for centralized FPAs to access the file systems of suspended and powered-off VMs.
Therefore, the prior art lacks a general and scalable mechanism for a file processing application to access the file systems of diverse computers in diverse states. Offloaded file processing can simplify the management of file processing applications. In some cases, it can also improve overall client and network performance. The two most commonly used offloading methods are synchronized network file system access, and unsynchronized snapshot access using a proxy.
There are many trade-offs between the two methods. For this reason, offloaded FPAs tend to be configured for one access method or the other, but rarely both. This results in highly specialized and inflexible setups that can be time-consuming and costly to set up. In particular, when snapshots and proxies are used in a heterogonous environment, the diverse LVM and file system driver requirements imposed on proxy computers can increase hardware costs and complicate software configurations. For instance, to fully take advantage of the performance benefits of snapshots, centralized FPAs need to be configured to run directly on proxy computers. This means that FPAs and proxies generally have to be aware of each other and are configured together, thereby increasing setup complexity.
Finally, there currently exists no general framework allowing centralized FPAs to access the file systems of file-based virtual machines, especially powered-off virtual machines and virtual disk snapshots.