Multi (i.e., parallel) processor architectures employ many interconnected processors to access large amounts of data and to simultaneously process a large number of tasks at high speed. Many multi-processors can execute instructions with operands that are arrays of data and are called vector or array processors. In order to obtain maximum utilization of a multi-processor, as many tasks as possible need to be scheduled for simultaneous execution on the available processors. Furthermore, interrelationships between various tasks must be continuously taken into account so as to assure ready availability of input operands when each task is ready to run. Scheduling of a task must be carefully controlled or else the entire benefit derived from the parallel processing may be lost, because one processor, due to a lack of necessary input data can delay a plurality of others.
Data Flow Graphs (DFG's) are often used by scientists and engineers to enable visualization of individual procedure tasks and their interrelationships. A DFG is a directed graph wherein edges (interconnections) denote data flows and nodes are functions or tasks that manipulate the data. The execution of a task, called the firing of a node, occurs when enough data is available on each input edge of the node. Associated with an input edge is a property called the threshold. When the data meets or exceeds the threshold on each of the input edges, the node fires. Each time the node fires, it "consumes" some amount of data from each edge that serves as an input to the node. The amount of data the node consumes is less than or equal to the threshold value. Each time a node fires, it produces some amount of data and places it on its output edges. Normally, this data is then consumed by some set of other nodes.
To aid in an understanding of terms to be used herein, the following definitions are provided:
Primitive/Task--The task or function represented by a node of a DFG is a primitive. A primitive may be any procedure that is preprogrammed and available from a library of procedures, to perform a specific task. PA1 Node--A graphical representation of a primitive. While a node in a DFG is performed on a processor, the term node does not represent the processor, but rather the specific primitive that is carried out by the processor. PA1 Threshold--Before a node fires, each input edge must contain at least as many data items as are necessary for the task to execute properly. The threshold parameter specifies the minimum quantity of data needed on a particular input edge before the node can fire. PA1 Subgraph Execution Program (SEP)--a program constructed from one or more primitives that executes sequentially on a processor, a SEP executes as one program; the primitive tasks within it execute sequentially in a predetermined order. PA1 Allocation--The process of dynamically managing physical processor utilization by allocating logical processor subgraph execution programs to available spare processor resources. PA1 Deployment--The distribution of object code to processors in a multi-processor, such code enabling the processors to perform, in parallel, an algorithm defined by a DFG. PA1 Pre-scheduling--A scheduling method wherein a program determines a firing order based upon how much data each primitive consumes and produces.
In general, there are two basic approaches to executing a DFG on a multi-processor. One method is to assign and schedule nodes using a "run time ready" assessment manager. The second method is called prescheduling and defines a "fixed schedule" for each task in advance of executing the DFG.
Run time ready assessment is a scheduling method wherein each input stream of DFG primitive is examined by a run time program. When the thresholds of all input streams for a primitive are met, the primitive is ready to be executed, i.e. "fired". Several primitives may be ready to fire at one time. The run time program executes ready primitives as it finds them, in any order. In addition to scheduling, the assignment of primitives to a processor is necessary on a multi-processor system. If assignment of primitives is accomplished during run time, significant amounts of run time compute and interprocessor communication resources are consumed.
There are significant problems with run time scheduling and assignment procedures. One such problem is termed "hiccup" and results when a series of primitive executions fail to meet a preestablished real-time deadline. A second problem is the finding quickly of an optimal multi-processor assignment and schedule. Such a problem is an NP-complete problem.
An NP-complete problem is a class of problems that do not have a closed form solution, however an optimum solution can be found by an exhaustive search. The classical NP-complete problem is the traveling salesman problem wherein it is desired to establish the most efficient route over which a salesman can travel and still achieve the required stops. Such a problem is solved by an exhaustive search of all possibilities, followed by a subsequent analysis of the results of each search iteration to determine an optimum solution. An exhaustive search of all possibilities requires long run times, which time grows exponentially as a function of the size of the input set. Thus, a run time ready scheduler requires substantial compute power and results in excessive use of interprocessor communication resources to arrive at optimal scheduling and assignment of primitives.
The problems described for run time scheduling and assignment can be avoided by defining a pre-schedule for each task in advance of executing the DFG. Such pre-scheduling includes a pre-assignment of primitives as one of its functions. Pre-assignment and pre-scheduling may incur exponential run times in finding optimal solutions, prior to deployment of object code for execution, so heuristic and probabilistic approximation methods are used. Pre-assignment and pre-scheduling are traditionally handled by the individual programmer employing a DFG representation of the required procedure and then proceeding to attempt a plurality of assignment/schedule iterations to determine one which is near-optimal. This is both time consuming and expensive and has retarded the application of scientific problems to multi-processors.
Recently, software packages have become available that enable achievement of a higher level of automated scheduling of tasks on multi-processors. One such software package is titled Ptolemy and is further described in "Ptolemy: A Mixed-Paradigm Simulation/Prototyping Platform in C++", Internal College of Engineering Paper, University of California, Berkeley, Calif. A further such software package is called HYPER described in "Hardware Mapping and Module Selection in the Hyper Synthesis System", C. Chu, Memorandum UCB/ERL M92/46, (1992), College of Engineering, University of California, Berkeley, Calif. and in "Algorithms for High Level Synthesis: Resource Utilization Based Approach", M. Potkonjak, Memo No UCB/ERL M92/10, (1992) College of Engineering, University of California, Berkeley, Calif.
PTOLEMY enables a complex scientific problem to be programmed as a DFG, where primitives are then assigned to physical processors in an existing multi-processor architecture. HYPER is similar to Ptolemy except it maps one DFG to an application specific integrated circuit. HYPER is limited to digital signal processing applications. Both programs employ the concept of pre-assignment and pre-scheduling of tasks and enable allocation of the tasks across known multi-processor or integrated circuit architectures. However, neither of these programs, nor any others known to the inventors hereof, enable run time allocation of pre-assignment and pre-scheduling tasks to a variety of multi-processor architectures. The prior art is thus limited in applicability to fixed processor allocations of pre-scheduled/pre-assigned DFGs.
Accordingly, it is an object of this invention to provide an improved method for allocating tasks of a complex problem across a variety of multi-processor architectures.
It is another object of this invention to provide an improved method for allocation of tasks which have been pre-assigned and pre-scheduled on logical processors and mapped to physical processors in a selected multi-processor architecture.