1. Field of the Invention
This invention relates to data processing, and more particularly to complex dataflow computations.
2. Description of Related Art
Complex data processing applications may be assembled from components by linking the outputs and inputs of various processing stages by means of communications channels (e.g., TCP/IP). In general, such communication channels provide limited data buffering capacity. When a channel's buffer space is exhausted, the channel will `block,` such that it is not possible to write additional data to the channel. In most cases, blockage is harmless, e.g., when the output of a fast program is connected to the input of a slower program. Under such circumstances, the finite buffering capacity of the communication channel serves to regulate computation such that the faster program does not get too far ahead of the slower program.
However, under certain circumstances, channel blockage can lead to a form of system failure called a `deadlock`. FIGS. 1 and 2 are dataflow diagrams showing a simple example of channel blockage. Suppose a first, "upstream" program 100 produces two outputs 101, 102, and a second, "downstream" program 103 requires two inputs 104, 105. Further, suppose that the outputs of the upstream program 100 are linked to the inputs of the downstream program 103 by two communication channels 106, 107.
In the course of the computation, the following set of circumstances may occur, as illustrated in FIG. 2:
The upstream program 100 wishes to write data to its first output 101. PA1 The downstream program 103 wishes to read data from its second input 105. PA1 The first communication channel 106 is full (its buffer space is fully committed). PA1 The second communication channel 107 is empty (it contains no untransmitted data). PA1 Any "downstream" program having more than one input I is provided with a pool of supplemental buffer space. In the preferred embodiment, each input I is associated with a "deferred input queue" which may refer to a sequence of data blocks in the supplemental buffer space. PA1 The inputs of each downstream program are partitioned into disjoint input sets, such that two inputs are in the same partition if and only if they obtain their input, either directly or indirectly, from a common "upstream" program. PA1 If a downstream program needs to read data from an upstream program via some input I for which no data is available, AND if any other input J in the same input set has data available, THEN the downstream program continuously reads available data from each such input J and stores that data in a supplemental buffer corresponding to such input J until such time as available data is exhausted on all such inputs J OR data becomes available on the desired input I. In the preferred embodiment, this is done by allocating a block of storage from the supplemental buffer, filling that storage block with data, and adding the storage block to the deferred input queue. If the supplemental buffer becomes full, the downstream program aborts rather than risk a deadlock. PA1 If the downstream program needs to read data from some input I which has data in the supplemental buffers, then data is extracted from the supplemental buffers instead of from the corresponding communication channel.
Neither the upstream program 100 nor the downstream program 103 can make any further progress, and thus the computation will never complete. This situation is generally known as a deadlock; in the context of this discussion it will be called a `buffer deadlock.
Since the possibility of a buffer deadlock may lead to application failure, a method for preventing buffer deadlocks in dataflow computations would be very useful. The present invention provides a solution to this problem.