1. Field of the Invention
This invention relates generally to network computer systems and, more particularly, to efficient transfer of data from client computers to server computers.
2. Description of the Related Art
Most computer networks include one or more server computers, which are a source and repository for large blocks of data, and multiple client computers, which communicate with the servers, operate on smaller blocks of data, and transfer the edited data back to the servers. The server computers typically are capable of storing large amounts of data, on the order of 100 gigabytes (GB) or more at each server. Such storage can be achieved with a variety of data storage systems, including large magnetic and magneto-optical disk libraries and magnetic tape libraries. In most computer databases, the data is organized into tables. A master copy of all the tables is distributed over one or more of the servers, while designated rows and columns, or subtables, of the database are copied to clients and modified before being returned to their respective server. This scheme permits the servers to exercise version control over the subtables extracted from the database and to manage the sequence and priority of data modifications by the clients.
The size of the data blocks handled by the client computers may be many megabytes (MB) of data. While a user at a client computer modifies and updates a block of data, the change commands themselves or the data block changes may be sent back and forth many times between the client and the server. Many database systems comprise a relational database management system (RDBMS) in which the data is organized according to table relations and application programs manage document version control, user access, and other network and document management issues. An example of such a system is the xe2x80x9cADSTAR Distributed Storage Managerxe2x80x9d system (ADSM) by International Business Machines Corporation (IBM Corporation).
An RDBMS typically incorporates periodic backup operations in which database tables are copied from the client computers to the servers. Backup operations are important for ensuring reliable data recovery of critical data, should that need arise. A backup operation may transfer not only updated database tables, but also information that largely duplicates a sequence of data changes or transactions that were executed at a client computer. This type of backup system implements an insulated server and can involve the transfer of very large blocks of data. During an insulated-server backup session, for example, 100 GB of data may be transferred from a client to a server. Thus, there may be a need to reliably transfer large blocks of data between the servers and the clients. Even at network data transfer rates of 10 MB/sec or more, it is not unusual for a backup session to require several hours for completion, due to the volume of network traffic that must be accommodated.
Because a client computer cannot reliably perform other operations while the backup operation is proceeding, a client is usually taken off-line during backup and therefore is not available to a user for normal or typical data operations. As a result, most backup operations are performed during evening or late night hours when most users have no need for on-line access and are not engaged in any data operations. This scheduling minimizes the impact of backup operations on the computer system and causes the least disruption to continuing data operations. Increasing workloads, however, have left less off-peak time available for backup operations, and database complexity has increased the amount of data that must be transferred during backup operations. Moreover, many computer systems have a need for 24-hour availability of on-line data operations. For example, airline reservation systems, financial institutions, and municipal services may need virtually 24-hour availability. These circumstances can severely tax the capability of the computer network to accommodate the volume of data traffic that is occurring. Such operational needs place ever greater demands on the communications infrastructure of a computer network.
One way of providing the necessary infrastructure to accommodate the transfer of large data blocks during backup operations is to upgrade all computer network communications links between client computers and server computers for all traffic conditions. Thus, all the communications links will be able to accommodate data traffic during normal operations and also during backup operations. Unfortunately, this upgrade approach may be costly and time consuming. It may not be practical if the existing network infrastructure cannot be upgraded without extensive delays for cabling and resource improvements due to distance, technological, and physical constraints. In addition, the extensive network-wide upgrades may be triggered by only a few isolated bottlenecks for large network data transfers, requiring expensive efforts and expenditures to rectify isolated problems.
Another way of handling transfers of large data blocks for backup is to off-load the transfers to disk or tape archives. When data from the archives is needed, the appropriate disks or tapes can be manually loaded onto storage drives and the data can be accessed. This off-load procedure may be useful, but does not permit immediate and 24-hour access to the data. In addition, the logistics of keeping a library of archive media, maintaining an archive log, replicating the data, and transporting the archives may be prohibitive.
Yet another solution is to provide distributed data holding areas on the network for temporary holding of bulk data transfers. The holding areas can be used to cache data blocks near the client computers and hold the data blocks until they can be sent through the network to the server computers when other network traffic is reduced. Under this scheme, however, the data again would not be immediately available to the server, preventing 24-hour operation. In addition, the cost of providing the holding areas could be prohibitive.
From the discussion above, it should be apparent that there is a need for efficient transfer of large data blocks from client computers to server computers over a network, without disruption to normal data operations and without a requirement for extensive and expensive resource upgrades or cumbersome and inconvenient archive methodologies. The present invention fulfills this need.
The present invention provides a system and method that transfers data between a client computer and a server computer over a network, wherein communications are established over a first data link between the client and the server to provide the server with identification of the data to be transferred, then communications are established over a second data link between the client and the server for data transfer, such that the second data link has a faster data transfer rate than the first data link, then the identified data is transferred from the client to the server, and then finally the client computer is provided with status information relating to the transfer of the identified data. The first data link is sufficient to support normal data operations utilizing existing network resources. The second data link can connect the clients to one or more of the servers, or may connect only particular clients and servers with special needs for large data transfer. The system provides efficient transfer of large data blocks from clients to servers over the network, without disruption to normal data operations and without a requirement for extensive and expensive resource upgrades or cumbersome and inconvenient archive methodologies.
In one aspect of the invention, both client and server have the capability of communicating over first and second data links, the first data link being used for normal network traffic and setting communications parameters, and the second data link being used for high speed, bulk data transfers like those involved in backup operations. Selecting between the two links is the responsibility of the client computer. Any two client and server computers that will perform data transfer in accordance with the invention must share the capability of selecting alternate communication links. The server may schedule bulk data transfers, but the actual transfer of backup data occurs under supervision of the client. This minimizes the disruption to normal client workload due to the backup operations.
In another aspect of the invention, before transfer of data begins, the client provides the server with metadata that specifies the data and provides associated client identity and authorization, destination file information, and data size. The metadata also permits optional performance enhancements, such as optional data compression processing prior to data transfer.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiment, which illustrates, by way of example, the principles of the invention.