Nested loops, e.g., of two to five times, are very common in high performance computing (HPC) code, for instance. Loop collapsing improves performance by reducing the number of branches and hence the probability of branch mispredictions. A conventional way to collapse multi-nested loops is to create a loop without nests, controlled by a new loop counter that is incremented on every iteration of the collapsed loop. The new loop counter is incremented (tcn-1*tcn-2* . . . *tc0) times totally, where tcj is a loop count of the loop over ij. However, the information about individual loop counters needs to be preserved for computations inside the loop and for use as indexes to access multi-dimensional arrays.
Also, while loop collapsing in some cases may improve performance, current compilers rarely can efficiently collapse loops. A few most frequently seen reasons that prevent collapsing include: non-stride memory accesses in n-dimensional array A (after collapsing); existence of accesses to a sub-dimensional array B (m-dimensions, m<n); and existence of computations over separate loop counters (ij).