Storage networking is the practice of connecting storage devices to computing devices (e.g., clients, servers, and the like) by using Fibre Channel networks instead of traditional point-to-point small computer system interface (SCSI) channels. A network used to connect servers to storage devices is referred to as a storage area network (SAN). Within a SAN environment, all computing devices have access to the available storage devices. This presents a wide variety of benefits, including server platform fail-over wherein a failed storage device is automatically recovered by another operational server platform without requiring any recabling of the storage devices themselves. As will be apparent to one of ordinary skill in the art, connectivity among the computing devices and the underlying storage devices within the SAN environment is shared. Prior to the development of SAN technology, local and wide area networks provided connectivity between computing devices that did not include storage devices. Connections were established with network protocols such as Transmission Communication Protocol (TCP), Unreliable Datagram Protocol (UDP), and others. These protocols ensure that message ordering is preserved and that messages are not lost. Distributed File Systems such as network file system (NFS) and Common Internet file system (CIFS) are layered on top of network protocols. Distributed File Systems organize access to files and correspondingly data storage elements across a network consisting of heterogeneous computing devices. The computing devices are typically organized as clients and servers, in a client-server architecture. Access to files or data storage elements is transparent to any particular computing device, such that access is consistent across the distributed file system without the need to have any private information about the physical locations or details associated with any particular file or data element.
The convenience of distributed file systems comes with an expense, since every byte of data exported by a file server managing a distributed file system must pass through the file server's memory, through the communications stack, and through a network interface controller before it reaches the application. Accordingly, the low performance and low throughput of operation associated with distributed file systems prohibit the implementation of many high performance data-sharing applications such as large scale distributed database applications, backup applications and the like. SAN environments present the opportunity to alleviate this issue by elevating storage devices within the network to peers of the clients and the servers, thereby in theory providing the opportunity for improving throughput of operation.
Yet, SAN technology has not produced advances in throughput of operations, as one might anticipate. This is due to the fact that shared access to data among several compute platforms must be mediated by distributed file systems. Consequently, while the speed of connections between platforms has scaled upward with the introduction of SAN, the basic method of using distributed file systems to share data has not changed. Distributed file systems are innately restricted in the level of performance that can be achieved due to the computing overhead introduced by the communication protocol. Consequently, application writers are motivated to find strategies other than distributed file system in order to share data at speeds that are consistent with SAN technology. These strategies typically employ sharing information about files and volumes with remote application components. Using this information, an application can know everything about a file without having access to the file through a distributed file system. Additionally, the application can use this information to reference data directly on the SAN-connected storage device.
For these strategies to succeed, applications need to be able to discover sufficient information about files and volumes that a component on another platform can access the data associated with the file or volume. Customarily, this type of information is not externalized by either file systems or distributed file systems. As used herein this is referred to as private information. Private information differs from one file system operating on one computing device within the SAN and another file system operating on another computing device within the SAN. Correspondingly, data storage element portability is difficult to achieve within the confines of present SAN technology, since existing software techniques being used do not take advantage of the SAN's shared connectivity and architecture.
Furthermore, the very purpose of file system and volume manager function within an operating system is to hide the private information related to data storage elements located on one or more data storage devices. Accordingly, operating system vendors, file system vendors or volume manager vendors do not reveal or provide any useful interfaces that provide access to private information. Moreover, storage environment software stacks are complex and trying to extract the private information from existing stacks is not readily achievable without intervention from the file system software or volume manager software itself.
Processing latency, within a SAN environment, is particularly conspicuous during the operation of a common and necessary data backup. During a data backup operation often-voluminous quantities of data bits are transferred from target storage devices to destination storage devices. As the data bits are transferred, the processing throughputs experienced by the server that owns the data and executes the backup operation are noticeably degraded. Before any data backup within the storage environment can occur, the data being backed up must be stabilized and temporarily locked until a consistent transactional version of the data is acquired from the SAN environment and successfully written to the target storage devices.
Stability requires flushing pending operations, which can alter the transactional consistency of the backup operation, from volatile cache memory to the appropriate target storage devices before copying the data from the target storage devices to the destination storage devices. In a typical SAN environment, a number of write operations, which can alter the data to be backed up, can be in various stages of completion when a request to perform a backup operation is received. Accordingly, transactional consistency and temporal stability of the data must be acquired by flushing and completing the pending write operations for the data.
The stabilized data, which resides on the target storage devices, is referred to as a “frozen image.” The frozen image will include one or more storage data elements within the SAN environment. Further, the frozen image is created using snapshot and/or mirroring techniques, these techniques are well known to one of ordinary skill in the art. Typically, the same computing device requesting a data backup operation within the SAN environment also initiates and performs the transfer of data bits from the target storage devices to the destination storage devices, resulting in unusually high data volume and traffic on the computing device and within the SAN environment as a whole. Moreover, the data backup operation is performed as a series of customized operations designed to handle a variety of errors that can occur during read operations, write operations, copy operations (e.g., combined read and write operations), or move operations (e.g., combined read, write, and delete operations).
Moreover private information about individual data storage elements is in a continual state of flux in modern data storage architectures, since at any moment in time data storage elements are moved during storage device failure, devices are reorganized to reduce fragmentation, and the like. Therefore, any ability to acquire private information must also entail notification that private data has been changed. Otherwise, the acquired private information may be stale or inaccurate. This is particularly significant in performing data backup operations, since during an intervening period of time occurring after a data backup operation is initiated but before the backup operation is actually processed, the data storage elements can be modified resulting in the incorrect transfer of some data to the destination storage devices.
In an effort to address some of these problems some industry associations have been initiated in an attempt to standardize data storage device communication. For example, the Storage Network Industry Association (SNIA) and the National Committee for Information Technology Standards (NCITS) technical committee T11 have been established. Yet, these associations are attempting to gain voluntary standardization for storage device manufacturers to adhere to when developing storage devices, and therefore the work of these associations will not assist with existing storage devices, existing operating systems, existing file systems or existing volume managers. Accordingly, the success of these associations is limited in scope and has yet to be demonstrated even with newly released products and services.
Therefore, what is needed are methods and systems for improved data backup within a SAN environment, such that the shared connectivity of computing devices and storage devices are more fully utilized to cooperate and improve on the processing throughput associated with data backup operations within the SAN environment. As one of ordinary skill in the art will understand upon reading the present invention, this will result in reducing the computing overhead associated with backup on the backup server and allow the backup to be directed to a frozen image of the source data thus reducing the impact of backup processing on the backup server owning the data.