The present invention generally relates to mapping software program loops to a hardware implementation.
Software-implemented designs, or parts thereof, are sometimes re-implemented in hardware for cost and performance reasons. Program loops within the software are synthesized in hardware as synchronous circuits that include interconnected logic units and registers that are synchronously clocked.
Live-in variables to the loop correspond to primary inputs of the circuit, live-out variables correspond to primary outputs, and recurrences correspond to registers or RAMs. This correspondence allows synchronous circuit optimization techniques, along with compiler techniques to be applied to the problem of mapping such loops onto efficient synchronous circuit implementations.
Generally, the result of loop synthesis is a multi-staged pipeline structure consisting of logic and registers. Data flows both forward from the outputs of registers in one stage to the inputs of logic (or registers) in later stages, and backward from the output of logic (or registers) in one stage to the inputs of logic (or registers) in previous stages.
Pipeline compaction is a known technique for reducing the number of registers in pipelined circuit structures. In a circuit design that is represented as a graph with nodes and edges, pipeline compaction iteratively minimizes the slack on all input edges on all strongly connected components (SCCs), moving each SCC backward to the earliest legal time slot. An SCC is a subset, S, of nodes in a graph such that any node in S is reachable from any other node in S, and S is not a subset of any larger such set.
Even though pipeline compaction is effective in reducing the register requirements of a pipelined circuit design, pipeline compaction does not always produce an optimally reduced circuit.
A system and method that address the aforementioned problems, as well as other related problems, are therefore desirable.
In various embodiments, the present invention provides a method and apparatus for reducing a number of storage elements in a synthesized synchronous circuit. In one embodiment, the circuit is represented as a directed, partitioned graph. The graph is divided into a plurality of time-ordered time slots that are bounded by storage elements. The strongly-connected components (SCCs) in the graph are first identified. For each middle SCC where there is slack between the middle SCC and a first SCC and slack between the middle SCC and a second SCC, a time-slot-relative direction is selected for moving the middle SCC. The direction is selected as a function of a number of storage elements required for moving the middle SCC toward the first SCC versus moving the middle SCC toward the second SCC. The middle SCC is then moved in the selected time-slot-relative direction.
It will be appreciated that various other embodiments are set forth in the Detailed Description and Claims which follow.