1. Technical Field
The present invention generally relates to scheduling for distributed data collection processes and in particular to scheduling for a distributed data collection process involving a large number of data generation nodes. Still more particularly, the present invention relates to scheduling for distributed data collection through localized scheduling and bifurcation of the input and output scheduling processes.
2. Description of the Related Art
Distributed applications which operate across a plurality of systems frequently require collection of data from the member systems. A distributed inventory management application, for example, must periodically collect inventory data for compilation from constituent systems tracking local inventory in order to accurately serve inventory requests.
Large deployments of distributed applications may include very large numbers of systems (e.g., than 10,000) generating data. Even if the amount of data collected from each system is relatively small, this may result in large return data flows, consuming substantial bandwidth and time. Keeping all data generation nodes available for data collection throughout the distributed collection mechanism would be extremely wasteful of resources. However, scheduling collection from each data generation node presents a daunting problem with a large number of nodes.
The scheduling problem for distributed data collection among large numbers of nodes is further complicated when nodes are not always available, but only have intermittent or irregular periods of availability, which is likely to occur in data collection for certain information types such inventory or retail customer point-of-sale data. Nodes from which data must be collected may be mobile systems or systems which may be shut down by the user. As a result, certain nodes may not be accessible in a deterministic manner.
It would be desirable, therefore, to provide a scheduler for a distributed data collection process which is capable of handling large numbers of data generation nodes while accommodating nondeterministic node availability.
It is therefore one object of the present invention to provide improved scheduling for distributed data collection processes.
It is another object of the present invention to provide scheduling for a distributed data collection process involving a large number of data generation nodes.
It is yet another object of the present invention to provide scheduling for distributed data collection through localized scheduling and bifurcation of the input and output scheduling processes.
The foregoing objects are achieved as is now described. Scheduling in a distributed data collection process is performed locally, within collectors. Scheduling of data transfers from endpoints or downstream collectors or to upstream collectors is based on local queues, without global management. Additionally, scheduling for the collector input queue, which manages data collection from endpoints or downstream collectors, is bifurcated from scheduling for the output queue, which manages notifications to upstream collector(s) regarding the availability of collection data for pickup. Such bifurcation permits simpler scheduling logic and different functional responses to similar events, and further localizes scheduling. Scheduling of collection data transfer is controlled, within parameters specified by the output scheduler for the endpoint or downstream collector, by the input queue for the upstream collector. Scheduling is thus based primarily on the portion of the data transfer mechanism mostly likely to comprise a bottle-neck, the upstream collector, but accommodates large numbers of fully parallel data generation endpoints as well as nondeterministic endpoint availability.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.