In a large scale digital circuit such as, but not limited to, a Field-Programmable Gate Array (FPGA) or an application-specific integrated circuit (ASIC), a number of DSP structures often work together to implement complex tasks. To achieve improved performance, these DSP structures are often operated at high speeds. While FPGA speed, or alternatively the ASIC processing speed, has been improved, one constraint is the propagation delay of signals between two DSP structures, especially when a random routing distance between the two DSP structures is encountered, which can be introduced by row based redundancy. For example, when a number of DSP structures or blocks are connected in a systolic mode to improve system throughput, one of the challenges in operating 1 GHz FPGA is the efficiency of interconnection between DSP blocks. Once the 1 GHz DSP block has been designed, multiple DSP blocks are connected together to create a single structure, and operated at a high speed, for example, 1 GHz in a single structure, and thus efficient interconnection between the blocks is desired to improve multi-block performance.
One method for improving performance in this case would be to add pipeline stages between the DSP structures. Pipelining techniques can be used to enhance processing speed at a critical path of the DSP structure by allowing different functional units to operate concurrently. Pipelined systolic structures, however, may not operate correctly, as the enable flow can be disturbed at times. Thus, summing of values across DSP structures can yield an inaccurate result, as the pipeline depths are no longer balanced. Additional balancing registers can be added to balance the delays, which can incur additional hardware and logic cost.