The presence of loops in programs is a source of significant amount of instruction-level parallelism (ILP). Different architectures try to exploit the benefits of the inter-iteration parallelism. But some approaches are limited in their scope and application, insofar as solutions utilizing multiple threads deal only with loops either with completely independent iterations or with non-ordered explicitly synchronized cross-iteration communications going thru memory.
In strand-based loop processors, a group of strands execute the same instructions of a loop in parallel, with different strands executing different iterations of the loop. A challenge that is faced is determining how to detect the iteration on which counted exit should be taken. Another challenge is to dynamically verify that hardware support can be used for loops without a statically verified counter exit.