One means of increasing the performance of computing systems is by increasing parallelism rather than depending on transistor feature reduction per Moore's Law. But, this approach becomes limited if processing elements cannot consume data from memory at the desired processing rate, leading to a significantly degraded overall performance.