Processing of data in processing systems and processing devices can be handled by various architectures and algorithms. Some data processing algorithms include iterative processing, which process a piece of data one or more times, such as in encryption, signal processing, hashing, or other data processing techniques. However, it can be difficult to achieve high throughput in data processing devices with limited logic resources when employing iterative algorithms.
Various techniques have been developed to enhance performance of iterative algorithms implemented on data processing devices, such as conventional single-stage iterative processing, serial pipeline processing, and parallel processing. However, each of these techniques includes shortcomings which limit throughput and hinder implementation on small logic devices. For example, iterative algorithms can be implemented in many parallel logic blocks to process data simultaneously to enable faster data processing throughput. However, in logic devices, the inclusion of many processing blocks in parallel can lead to high fan-out problems or require a large communication bus to distribute/collect data to/from the many parallel data processing blocks. Conventional serial pipeline techniques, which unroll an iterative loop partially or entirely, can also help to increase throughput, but at the expense of large amounts of serial logic which can consume scarce logic resources of a logic device.