Vectorizing loops containing possible cross-iteration dependences is notoriously difficult. An exemplary loop of this type is:
for (i = 0; i < N; i++) { A[i] = B[C[i]]; }
A naïve (and incorrect) vectorization of this loop would be:
for (i = 0; i < N; i + = SIMD_WIDTH) { zmm0 = vmovdqu32 &C[i] k1 = kxnor k1, k1 zmm1 = vgatherdd B, zmm0, k1 vmovdqu &A[i], zmm1}
However, if the compiler generating the vectorized version of the loop has no a priori knowledge about the addresses or alignment of A, B, and C, then the above vectorization is unsafe.