Vectorizing loops containing possible cross-iteration dependences is notoriously difficult. An exemplary loop of this type is:
for (i = 0; i < N; i++) {A[i] = B[C[i]];}
A naïve (and incorrect) vectorization of this loop would be:
for (i = 0; i < N; i += SIMD_WIDTH) {zmm0 = vmovdqu32 &C[i]k1 = kxnor k1, k1zmm1 = vgatherdd B, zmm0, k1vmovdqu &A[i], zmm1}
However, if the compiler generating the vectorized version of the loop has no a priori knowledge about the addresses or alignment of A, B, and C, then the above vectorization is unsafe.