This invention relates to integrated circuits and, more particularly, to a specialized processing block with embedded pipelined floating-point accumulator circuitry in an integrated circuit.
Every transition from one technology node to the next technology node has resulted in smaller transistor geometries and thus potentially more functionality implemented per unit of integrated circuit area. Synchronous integrated circuits have further benefited from this development as evidenced by reduced interconnect and cell delays, which has led to performance increases. However, more recent technology nodes have seen a significant slow-down in the reduction of delays and thus a slow-down in the performance increase.
Solutions such as register pipelining have been proposed to further increase the performance. During register pipelining, additional registers are inserted between synchronous elements, which lead to an increase in latency at the benefit of increased clock frequencies and throughput. However, performing register pipelining often involves spending significant time and effort because several iterations of locating performance bottlenecks, inserting or removing registers, and compiling the modified integrated circuit design are usually required.
In recent years, floating-point operations are often being used instead of fixed-point operations because of the increased precision. Situations frequently arise where the floating-point operation which is implemented in a specialized processing block becomes the performance bottleneck of an application and register pipelining may increase the performance of some applications. However, pipelining may be problematic for specialized processing blocks with floating-point operations that are executed in a loop such as a floating-point accumulation operation, and the pipelining of a floating-point accumulator often requires a significant amount of additional logic, memory circuitry, and complex control structures.