1. Field of the Invention
This invention relates to computer systems and, more particularly, to data transfer in computer systems.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding many terabytes of data, for mission-critical applications. A variety of different storage devices, potentially from multiple storage vendors, with varying functionality, performance and availability characteristics, may be employed in such environments. Numerous data producers (i.e., sources of new data and updates to existing data) may need to transfer large amounts of data to consumers with different sets of storage access requirements. In some enterprise environments, hundreds or thousands of data producers and data consumers may be operating at any given time. Sustained update rates on the order of tens to hundreds of gigabytes per hour may need to be supported in large enterprise data centers, with spikes of even higher levels of I/O activity. In some environments, furthermore, access patterns may be skewed towards the most recently updated data: that is, instead of being uniformly spread over an entire data set, a relatively large proportion of write and read requests may be directed at a “working set” of recently modified data.
As the heterogeneity and complexity of storage environments increases, and as the size of the data being managed within such environments increases, providing a consistent quality of service for data transfer operations may become a challenge. Quality of service requirements may include the ability to predictably sustain data transfers at desired rates between data producers and data consumers, data integrity requirements, and the ability to recover rapidly from application, host and/or device failures. At the same time, advanced storage features, such as replication and archival capabilities, may also be a requirement for enterprise-level storage environments.
Some traditional data sharing mechanisms, such as applications that may utilize NFS (Network File Systems) or FTP (File Transfer Protocol), may rely on data transmission over networking protocols such as one or more underlying protocols of the Transmission Control Protocol/Internet Protocol (TCP/IP) family. Such traditional mechanisms may sometimes provide unpredictable levels of performance, especially in response to error conditions and/or network congestion. In addition, many traditional networking protocols may have been originally designed for short messages, and may not scale well for large data transfers. Networking software stacks within operating systems and/or service-specific daemons or service processes may add to dispatch latency and processing load at data producers and data consumers. Simultaneous transfer of data between a single data producer and multiple data consumers may require the use of relatively inefficient broadcast or multicast protocols. Some data sharing mechanisms that rely on file systems may also result in additional overhead for file management.
Other traditional data sharing mechanisms, such as various types of clustered systems, may require data producers and data consumers to be tightly coupled (e.g., configured within a single cluster), even if application requirements may demand other configurations of producers and consumers, such as producers and consumers in different clusters or within un-clustered servers. Some clustering solutions may also require extensive and cumbersome configuration management. In addition, traditional data transfer mechanisms may provide inadequate support for efficiently re-using resources (such as disk space or memory) dedicated to data transfer, for example, insufficient support may be provided to re-use storage for the data that has already been consumed while a long data transfer continues. The requirements for sustained high performance, predictability, and improved data integrity during data transfers may place a high burden on system managers.