The number and complexity of datapath operations implemented in various kinds of systems, particularly those on integrated circuit chips, has increased considerably over the years. This is especially true in chips used for graphics, communication, and multimedia processing applications, which have employ parallel implementation of signal processing algorithms such as fast Fourier transforms, finite impulse response filters and other DSP algorithms.
One perennial need in this field is in the optimization of datapath operations to minimize area, power requirements, and delay. Current techniques are limited in scope, permitting only the merging of individual datapath operators such as adders, multipliers, and shifters. For example, datapath-intensive register transfer level (RTL) designs require synthesis techniques that yield optimized implementations of groups of datapath operators instead of individual operators.
One useful technique is operator merging, which refers to clustering of multiple datapath operators so that they can be synthesized together as a unit. In particular, designers and researchers have explored synthesizing a cluster of datapath operators as a sum of addends using carry-save adders and Wallace trees. For example, synthesis of the sum of product expression a*b+c*d using traditional synthesis requires 2 multipliers and an adder. Such an implementation has 2 carry-propagate adders on any input-to-output path. Operator merging can implement such an expression using only one carry-propagate adder by reducing the partial products of the multipliers in a single carry-save reduction tree (CSA-tree).
An algorithm for operator merging to achieve datapath synthesis has also been proposed which first partitions a data flow graph into clusters of datapath operators and then synthesizes each cluster using a CSA-tree, that is, a combination of a reduction tree of carry-save adders and a final adder.
The effectiveness of operator-merging in improving performance of netlists for datapath intensive designs has been demonstrated. Research has also focussed on the optimal implementation of synthesizing clusters of datapath operators as sums of addends using carry-save adders and bit-oriented Wallace trees. Such work has further supported the usefulness of operator merging.
The problem of optimization of datapaths is a deep problem and will continue to demand attention from researchers. There is thus a continuing need for improvements in the various approaches.
Partitioning a data flow graph into clusters is a preliminary step in the optimization of datapaths. Operator merging maximizes the mergeability of operators to permit larger and fewer clusters to be defined by optimization procedures. Each cluster representing a sum of addends is associated with the burdensome delay and area of a final carry-propagate adder. Partitioning of datapaths into larger numbers of small clusters generally means more timing delay and area of the resulting netlist. In contrast, increased merging may provide reductions in the number of carry-propagate adders and consequently reduced critical path delay.
In the present specification, several techniques are proposed for partitioning data flow graphs into clusters. In particular, the techniques allow safe reduction in the bitwidths of datapath operators used in designs. This allows the first pass of synthesis to generate faster and smaller netlists. They also reduce the amount of work at the gate-level logic optimization step required to meet timing and area constraints. Further, the proposed method of partitioning a data flow graph into maximal mergeable clusters also defines criteria for safe partitioning of data flow graph and these may be used in problem scenarios other than operator merging. For example, they may be used for rebalancing computation graphs consisting of associative operators.
Safe clustering of data flow graphs (DFGs) is characterized in terms of required precision and information content of signals. This characterization is applicable to DFGs that have both signed and unsigned extensions of signals. Note that signed extension refers to adding higher significant bits by replicating the sign bit and unsigned refers to adding higher significant bits by adding zeros. The basic formulas and processes, based on notions of required precision and information content of a signal, are used to define safe, functionality-preserving, transformations on the DFGs, which allow the transformed graph to have potentially smaller widths (bitwidths) of datapath operators and potentially greater mergeability of datapath operators. Efficient algorithms for computation of required precision and upper bounds on information content and the related DFG transformations are proposed. These algorithms may be combined in an iterative procedure for partitioning a graph into maximal safe clusters.
The inventions will be described in connection with certain preferred embodiments, with reference to the following illustrative figures so that it may be more fully understood. With reference to the figures, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention or inventions only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention or inventions. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention or inventions, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention or inventions may be embodied in practice.