An example of such computational networks is found in time-domain digital filters. Such filters are commonly constructed within the confines of monolithic integrated circuits.
Computers using programs called "silicon compliers" can determine how electronic circuitry is to be disposed within the confines of a monolithic integrated circuit die. The silicon compiler begins with a behavioral description of the electronic circuitry and proceeds to design the masks used for implementing the photolithographic processes used to manufacture the monolithic integrated circuits used to implement that electronic circuitry. In this design process the silicon compiler constructs the design relying on a library of standard circuit designs which are interconnected using suitable intervening delay elements. This construction begins with the design being considered in data-flow graph terms, after which the design is converted to descriptions of standardized cells of integrated circuitry that are oriented vis-a-vis each other using tiling algorithms. When a suitable tiling of the standard cells is decided upon by the silicon compiler, full mask sets for the photolithographic processes are generated in accordance with further software in the silicon compiler.
In the design by the silicon compiler of monolithic integrated circuits that function synchronously, in accordance with shared clocking signals, an important aspect of the design procedure is the minimization of delays through a network of computational elements. There is a desire to reduce "latency", or the time it takes for a response to appear at an output port of the computational network after a stimulus is applied to an input port of the network. Latency is expressed in terms of clock cycles in digital computational networks, which are the computational networks of primary interest in regard to the invention. Often delay has to be inserted into a signal path through the network to synchronize a signal arriving too early with a signal that arrives later, so that the signals can be processed together with each other. Such delay is referred to as "shimming" delay or "shim" delay. In the interest of conserving digital hardware, particularly in large networks, it is desirable to minimize the need for shim delay.
In the prior art the problems of minimizing latency and shimming delay have been dealt with by means of linear programming. See the paper "Behavioral to Structural Translation in a Bit-Serial Silicon Compiler" by R. I. Hartley and J. R. Jasica appearing in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, Vol. 7, No. 8, pp. 877-886 (August 1988) and the earlier paper "An Optimal and Flexible Delay Management Technique for VLSI" by G. Goosens, R. Jain, J. Vandewalle and H. deMan in PROCEEDINGS, INTERNATIONAL SYMPOSIUM, MATHEMATICAL THEORY OF NETWORKS AND SYSTEMS (Stockholm, 1985). To facilitate linear programming, a structural description of the network called a data-flow graph is prepared by the silicon compiler. The data-flow graph is described in the Hartley and Jasica paper as a bipartite directed graph that has vertices (or nodes) representing operators and operands and has between these vertices directed edges showing the direction of data flow. "Operators" correspond to respective processing elements and "operands" correspond to respective signals the elements process. An operator node in the data-flow graph may, in general, have any number of inputs and any number of outputs. An operand node may have any number of outputs (fan-out), but only one input, because each operand is the output of a single operator. A respective input operator is introduced as the source of each input operand, to assure conformity to this rule. Many data-flow graphs incorporate digital word delay operators.
For a computational network to operate properly, each operator in the circuit must be accorded a "scheduled time" consistent with the timing constraints placed on its associated operands. That is, each of the input operands to the operator must be available at (or before) the scheduled time accorded that operator. The scheduled time of an operator and the processing delays through the operator after all its input operands are available determine a "ready time" for each output operand of the operator. The scheduled time for an operator is invariably as late as the (last) ready time for the operand(s) supplied as input operand(s) to it. Procedures for scheduling the operators in a network and determining the ready times for the output operands of each operator (which may be the input operands for succeeding operators) are known in the prior art and are described in the Hartley and Jasica paper, which description is incorporated herein by reference.
Alternatively, the data-flow graph can be represented as a directed hyper-graph having vertices (or nodes) corresponding to the operators and having hyper-edges corresponding to operands flowing between operators. A hyper-edge has one input vertex, the source of a respective operand, and as many output vertices as there are operators receptive of that respective operand. Again operators can have any number of input operands and usually have one output operand. This latter representation of the data-flow graph has recently tended to supplant the representation in the Hartley and Jasica paper, but differences between the representations are a matter of form rather than substance.
Procedures for scheduling the operators in a computational network can be adapted to include modification of the topology of the network of operators, it is pointed out. An objective of the invention is to modify the topology of the operators so as to provide improvements in scheduling--namely, reduced latency and savings in shimming delay. Trees of dual-input operators for performing processes that are associative and commutative in nature--e.g., adders--are often encountered in computational networks. Or, such trees can be obtained by using simple manipulations of the network structure, such as converting adder and subtractor trees to adder trees by taking advantage of signed arithmetic. Rearrangement of these trees of dual-input operators can provide improvements in the scheduling of operators in computational networks, as it intuitively apparent from the metal efforts of experts in digital filter design with regard to reducing digital filter hardware. The inventors have attempted simply to use linear programming techniques for rearranging networks including trees of dual-input operators, in order to minimize delay in the rearranged network. This direct approach has not been successful, however.
A key to successfully rearranging the networks for minimal delay has been found by the inventors to be replacement of the tree of dual-input operators by an equivalent multiple-input operator, which is then converted to a tree structure that has minimal delay. Where there is at least one loop in the network which includes the multiple-input operator, optimization of the scheduling of the operator in the modified network is done, which is likely to change the ready times of the input operands of the multiple-input operator, thus to require that a new minimal delay tree of dual-input operators must be considered. The rescheduling and construction of minimum-delay trees of dual-input operators continues until optimum scheduling is assured. The resulting network can then be further optimized using normal linear programming techniques. The problem of performing rearrangement of a tree of dual-input operators using linear programming is reformulated in the invention so that the problem is in large part a problem of synthesizing a minimal-delay tree of operators, which latter problem is tractable, especially if the dual-input operators used for constructing the tree are of a type that accept input operands concurrently.
There has been work done with regard to tree-height minimization in the context of software compilers for multi-processors. A summary of early work and an optimal algorithm for producing a minimum number of ranks (or levels) in trees containing multipliers, adders, subtractors, dividers, exponentiation operators and unary minus operators appears in the J. L. Baer and D. P. Bovet paper "Compilation of Arithmetic Expressions for Parallel Computations" appearing in pages 340-346 of IFIP 68 , North Holland Publishing Co., Amsterdam, 1969. Parenthesized sub-expressions are also handled. This algorithm, based on the associative and commutative properties of operators, parses infix expressions and exploits precedences specified for each type of operator. Each operation is assumed to require one time unit and all input operands arrive at the leaves of the expression trees simultaneously, as would be the case for operands residing in respective registers of a programmable computer. The number of levels in the resulting tree is minimized, with one pass over the expression being required for each level of the tree generated by the algorithm. J. C. Beatty's paper "An Axiomatic Approach to Code Optimization for Expressions", appearing in Journal of the Association for Computing Machinery pages 613-640, Vol. 19, No. 4, October 1972 extends the Baer and Bovet algorithm to take into account parameterized operator delays, and a proof of optimality is offered. An overview of tree-height minimization techniques including the use of the distributive property, appears in pages 67-107 of the book ALGORITHMS, SOFTWARE AND HARDWARE OF PARALLEL COMPUTERS, edited by J. Miklosko and V. E. Kotov, and published by Springer-Verlag, Berlin, N.Y., 1984. Complete consideration of tree-height minimization takes into account factoring and the distributive properties of operators, as well as their associative and commutative properties.