(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of distributed data services and more particularly to a method and apparatus for bandwidth optimizing adaptive file distribution.
2. Description of the Related Art
A fundamental operation executed by file servers in a distributed data network is the transmission of a file from a first storage location to a specified destination or set of destinations. For example, in a typical distributed data network, a file server can service requests from network client nodes for file data stored on a secondary storage device such as a hard disk drive. Special cases of distributed data networks can include, for example, multimedia server systems for providing on-demand multimedia, real-time financial data systems, and retail point-of-sale (POS) systems which can support retail purchase transactions.
Present retail POS systems can capitalize on a distributed software platform for sharing centrally stored and maintained data pertaining to the retail store environment. Typical shared data can include retail good information, pricing structure, employee data, and translation tables for use with bar code scanning systems. For instance, although retail good pricing tables may be maintained and stored on a central file server in a distributed POS system, each POS client node, typically residing in a retail store check-out aisle, can require local use of the pricing tables in order to properly record the sale of a retail good. Thus, the central file server must update each POS client node by distributing the pricing tables from the central file server to each POS client node as necessary.
The IBM StorePlace(copyright) Distributed Data Services(copyright) (Distributed Data Services) software platform, manufactured by International Business Machines Corporation of Armonk, N.Y., is an example of a distributed software platform designed to be used as a base for the development of distributed applications for the retail store environment. The main functions of the Distributed Data Services software platform include: interprocess communication, data access, directory services, and data distribution for redundancy, performance, and availability. The data distribution component of the Distributed Data Services software platform provides a file distribution service that can replicate data to multiple client nodes forming an image of the centrally stored data on each client node. The data distribution component can ensure the synchronization of each image of the centrally stored data during normal system operations. Furthermore, the data distribution component can perform file reconciliation when failed client nodes are brought back into service.
In a typical installation of a data distribution component, a primary distributor can centrally store prime copies of distributed files. The prime copy of a distributed file is the only copy of the file that can be changed or updated. As the prime copy of a distributed file is updated, renamed, or deleted, the changes are distributed to all client nodes having an image of that file. In addition to file distribution, the data distribution component can provide a reconciliation service. The reconciliation service can ensure that, if a client node misses updates for any reason, each distributed file on the node is re-synchronized with the prime copy of the file when communication is reestablished with the primary distributor.
Existing methods of data distribution utilize the User Datagram Protocol (UDP) in order to broadcast segments of a distributed file to client nodes in a distributed data network. UDP is a connectionless transmission protocol defined in the Request for Comment, RFC 768xe2x80x94User Datagram Protocol. UDP is an application interface to the Internet Protocol (IP) which serves as a multiplexer/demultiplexer for sending and receiving datagrams using ports to direct the datagrams. The application interface offered by UDP provides for the creation of new receive ports, a receive operation that returns the data bytes and an indication of the source port and the source IP address, and a send operation that has as parameters the data and the source and destination ports and addresses.
The UDP distribution technique, primarily because of its connectionless orientation, can be an efficient file distribution technique for a data distribution network having many client nodes included within the data distribution network. However, depending upon the underlying operating system, the UDP frame size can limit the size of transmitted file segments to 1 kilobyte (K). The 1 K limitation can compel the fragmentation of a 32 K file segment into individual 1 K segments. As a result, to broadcast a 32 K file segment from a primary distributor to the client nodes using UDP can consume 35 transmission cycles. This limitation can prove especially inadequate where a large file is broadcast to a single workstation.
For example, in a retail store setting during a re-synching operation between a primary distributor and a single client node, the distributed file can be a 50 megabyte (MB) bar code translation table containing store inventory. In that instance, a communications protocol capable of accommodating a larger frame size thus avoiding the fragmentation of each file segment would prove substantially more efficient. One such connection-oriented communications protocol, the Transport Control Protocol (TCP), is defined in the Request for Comment, RFC 793xe2x80x94Transport Control Protocol. Hence, what is needed is an adaptive file distribution method for choosing a transmission protocol in order to optimize network and processor bandwidth according to the number of client nodes active on the distributed data network.
The present invention is an adaptive file distribution method for intelligently choosing a transmission protocol in order to optimize network and processor bandwidth according to the number of client nodes active on the data distribution network. Specifically, a method for adaptively selecting a transport protocol for transmitting data segments across a distributed data network can include the steps of: determining the number and identity of subordinate nodes on the network that will receive the data segment; selecting a data transport protocol according to the number of subordinate nodes; and, transmitting the data segment to the subordinate nodes using the selected transport protocol. Significantly, the transport protocol preferably is selected from the group consisting of a connection-oriented protocol and a connectionless protocol.
In the preferred embodiment of the present invention, the determining step can comprise the steps of: initializing a counter in a primary distributor on the distributed data network; receiving in the primary distributor a request for data from subordinate nodes on the network; and, responsive to the request, incrementing the counter. In particular, each subordinate node periodically can request updated data from the primary distributor. As a result, the counter reflects the number of subordinate nodes on the network which require, at the time of a file update, the corresponding updated data. Alternatively, the determining step can comprise the steps of: initializing a counter in the primary distributor; identifying each active subordinate node on the network; and, for each identified active subordinate node, incrementing the counter.
Advantageously, the selecting step can comprise the steps of: selecting a connectionless transport protocol if the number of subordinate nodes exceeds four nodes. Otherwise, a connection-oriented transport protocol can be selected. In the preferred embodiment, the connectionless transport protocol is the Universal Datagram Protocol. Furthermore, in the preferred embodiment, the connection-oriented transport protocol is the Transport Control Protocol.
In the preferred embodiment, the transmitting step can vary depending upon the selected transport protocol. If a connectionless transport protocol is selected, the transmitting step can include broadcasting the data segment to the subordinate nodes using the connectionless transport protocol. Otherwise, if a connection-oriented transport protocol is selected, the transmitting step can include establishing a point-to-point connection with each subordinate node and subsequently writing the data segment to each subordinate node. If a connectionless transport protocol is selected, the broadcasting step can comprise the steps of: fragmenting the data segment into one kilobyte (1 K) slices of data; and, repeatedly transmitting the 1 K slices of data onto the network until all of the 1 K slices have been transmitted.
In the preferred embodiment, both connection-oriented and connectionless transport techniques can be used for distributing file updates and performing file reconciliation across a distributed data network. As such, a method for adaptively distributing file updates and performing file reconciliation in a distributed data network can comprise the steps of: updating a prime copy of a file residing on a primary distributor; and responsive to the update, identifying at least one subordinate node requesting the updated file.
If the number of identified subordinate nodes exceeds a threshold number, the updated file can be broadcast to the identified subordinate nodes. Otherwise, a point-to-point connection with each identified subordinate node can be established. Furthermore, the updated file can be transmitting to each identified subordinate node using the point-to-point connection.
Experimentally, it has been determined that if a file to be transmitted is to be transmitted to four or fewer subordinate client nodes, a connection-oriented protocol is an optimal choice in view of the larger frame size capabilities of the TCP protocol. Consequently, in the preferred embodiment, the broadcasting step can include the steps of: selecting a connectionless transport protocol for transmitting the updated file to the identified subordinate nodes if the number of identified subordinate nodes exceeds four subordinate nodes, and, transmitting the updated file to the identified subordinate nodes using the connectionless transport protocol. Advantageously, the connection transport protocol can be the Universal Datagram Protocol. Conversely, the establishing step can include: selecting a connection-oriented transport protocol for transmitting the updated file to the identified subordinate nodes if the identified subordinate nodes does not exceed four subordinate nodes; establishing a point-to-point connection with each identified subordinate node; and, transmitting the updated file to each identified subordinate node using the connection-oriented transport protocol. Advantageously, the connection-oriented transport protocol can be the Transport Control Protocol. Thus, by adaptively choosing an appropriate transport protocol, the inventive adaptive file distribution algorithm can optimize network and processor bandwidth in a distributed data network.