This invention relates to data processing, and more particularly to a system, method, and computer program for continuous flow data processing.
With the huge popularity of the Internet for data access and electronic commerce comes a need for high-performance, fault tolerant xe2x80x9cback-officexe2x80x9d processing capabilities that allow large volumes of data to be processed essentially continuously and in near real-time (i.e., responding to a user""s input within a few seconds to a few minutes). Such processing capabilities should be robust (i e., fault tolerant) to allow processing to continue where it left off after a failure. While such capabilities are useful for large-scale Internet-based data processing, they are often also applicable to conventional types of data processing over private networks and communication systems (e.g., airline reservation systems, internal corporate xe2x80x9cintranetsxe2x80x9d, etc.).
Achieving high performance for a particular volume of data often means using a parallel processing system to process the data within a reasonable response time. Numerous examples of parallel processing systems are known. For example, FIG. 1 is a block diagram of a typical prior art multi-process data processing system 100. Data from an ultimate source 101 (e.g., a web server) is communicated to at least one data queue 102. Data is read, or xe2x80x9cconsumedxe2x80x9d, from time to time by an initial process 104, which outputs processed data to one or more data queues 106, 106xe2x80x2. The process 104 typically is a single process that uses a two-phase commit protocol to coordinate consumption of input data and propagation of output data, in known fashion. Subsequent processes 108, 108xe2x80x2 may be linked (shown as being in parallel) to provide additional processing and output to subsequent data queues 110, 110xe2x80x2. The data is finally output to an ultimate consumer 112, such as a relational database management system (RDBMS). In practice, such a system may have many processes, and more parallelism than is shown. Further, each process may consume data from multiple data queues, and output data to multiple data queues.
To obtain fault tolerance, such systems have used xe2x80x9ccheckpointingxe2x80x9d techniques that allow a computational system to be xe2x80x9crolled backxe2x80x9d to a known, good set of data and machine state. In particular, checkpointing allows the application to be continued from a checkpoint that captures an intermediate state of the computation, rather than re-running the entire application from the beginning. Examples of checkpointing systems are described in U.S. Pat. No. 5,819,021, entitled xe2x80x9cOverpositioning System and Method for Increasing Checkpoints in Component-Based Parallel Applicationsxe2x80x9d, and U.S. Pat. No. 5,712,971, entitled xe2x80x9cMethods and Systems for Reconstructing the State of a Computationxe2x80x9d, both assigned to the assignee of the present invention.
A problem with using traditional checkpointing techniques with data where essentially continuous data processing is desired (e.g., Internet data processing) is that checkpoints may only be created when the system is quiescent, i.e., when no processes are executing. Thus, every process would have to suspend execution for the time required by the process that requires the longest time to save its state. Such suspension may adversely impact continuous processing of data.
Accordingly, the inventors have determined that there is a need to provide for a data processing system and method that provides checkpointing and permits a continuous flow of data processing by allowing each process to return to operation after checkpointing, independently of the time required by other processes to checkpoint their state. The inventors have also determined that, in the context of continuous flow data processing, that there is a need for a method and system for intermittently inducing execution of computational processes and output of data without having to checkpoint the system, thus saving time while enabling execution of processes that necessarily operate on a quantity of data records (e.g., sorting or certain statistical processes). The present invention provides a method, system, and computer program that provides these and other benefits.
The invention includes a data processing system and method that provides two processes, checkpointing and compute point propagation, and permits a continuous flow of data processing by allowing each process to (1) return to normal operation after checkpointing or (2) respond to receipt of a compute point indicator, independently of the time required by other processes for similar responsive actions.
In particular, checkpointing in accordance with the invention makes use of a command message from a checkpoint processor that sequentially propagates through a process stage from data sources through processes to data sinks, triggering each process to checkpoint its state and then pass on a checkpointing message to connected xe2x80x9cdownstreamxe2x80x9d processes. This approach provides checkpointing and permits a continuous flow of data processing by allowing each process to return to normal operation after checkpointing, independently of the time required by other processes to checkpoint their state. This approach reduces xe2x80x9cend-to-end latencyxe2x80x9d for each process stage (i.e., the total processing time for data from a data source to a data sink in a process stage), which in turn reduces end-to-end latency for the entire data processing system. Importantly, once checkpointing has been initiated, it propagates through each process without external control in a self-synchronizing manner.
More particularly, this aspect of the invention includes a method, system, and computer program for continuous flow checkpointing in a data processing system having at least one process stage comprising a data flow and at least two processes linked by the data flow, including propagating at least one command message through the process stage as part of the data flow, and checkpointing each process within the process stage in response to receipt by each process of at least one command message.
In another aspect of the invention, a compute point indicator is triggered and sequentially propagates through a process stage from data sources through processes to data sinks. The trigger event may be an external or internal event, and is usually directly detected by a data source. Optionally, a detected trigger event may be routed through an external processor which then initiates a compute point process. Importantly, once compute point indicator propagation has been initiated, the indicator propagates through each process without external control. Compute point indicators are used to mark blocks of records that should be processed as a group within each process. When there are multiple data flows going into a process, compute point indicators also effectively self-synchronize the data flows without external control. Compute point indicators mark the boundaries between blocks of data records simply by existing. When a process receives a compute point indicator, it waits for the corresponding compute indicator to be received on all of its input flows (if it has more than one input), then it does whatever blockwise computation function is appropriate for the component (e.g., sum or sort the data). The process then outputs the results of the computation. If the process is not a data sink, it also outputs the compute point indicator to any coupled downstream process. When a data sink receives a compute point indicator, it outputs whatever data it has available. Use of compute point indicators rather than checkpoints avoids the time delay that saving state imposes, while permitting a continuous flow of data processing, including outputting results.
More particularly, this aspect of the invention includes a method, system, and computer program for initiating processing of blocks of data within at least one flow of data input to a data processing system having a plurality of process stages, including at least one subscriber process stage, at least one publisher process stage, and optionally at least one intermediate process stage, the method including generating a compute point indicator in response to a trigger event; propagating the compute point indicator from each subscriber process stage through any intermediate process stages to each publisher process stage as part of the flow of data; and, for each process stage, processing a current block of data associated with the process stage in response to receipt by the process stage of at least one compute point indicator from an immediately previous process stage associated with the process stage.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.