1. Field of the Invention
This invention relates to processors, and more particularly, to techniques for synchronizing functional unit operations within a processor.
2. Description of the Related Art
In principle, complicated computational algorithms may be implemented by mapping the steps of the algorithm to simpler operations or primitives that may then be evaluated by computational hardware. For example, a typical microprocessor instruction set architecture (ISA) is usually sufficiently robust for functional implementation of arbitrarily complex algorithms. However, in many instances, the performance of the resulting algorithm implementation may be far from optimal, depending on the set of operations presented by the instruction set and the instruction overhead needed to control execution of the algorithm. For example, an instruction-based implementation of an algorithm that iterates over a small kernel of arithmetic operations may sacrifice a substantial degree of performance to the fetching, decoding and evaluation of instructions (such as branch instructions, for example) used to control the behavior of iteration.
In some instances, performance of certain algorithms may be improved by implementing some operations directly in hardware. For example, a direct hardware implementation of the iterative algorithm mentioned above may avoid the control overhead of branch instructions by using a state machine to control iteration. However, implementing the functionality of more complex operations within hardware presents additional challenges. In particular, more complex hardware operations in some respects may be more difficult to schedule for optimal performance than simpler hardware operations. For example, complex operations may have long and variable processing latency, which may frustrate attempts to generate a straightforward schedule of operations that also maximizes utilization of hardware resources.