Vectorizing loops containing possible cross-iteration dependences is notoriously difficult. An exemplary loop of this type is:
for (i = 0; i < N; i++) {   A[i] = B[C[i]];   }
A naïve (and incorrect) vectorization of this loop would be:
for (i = 0; i < N; i += SIMD_WIDTH) {   zmm0 = vmovdqu32 &C[i]   k1 = kxnor k1, k1   zmm1 = vgatherdd B, zmm0, k1   vmovdqu &A[i], zmm1}
However, if the compiler generating the vectorized version of the loop has no a priori knowledge about the addresses or alignment of A, B, and C, then the above vectorization is unsafe.