Vectorizing loops containing possible cross-iteration dependences is notoriously difficult. An exemplary loop of this type is:
for (i = 0; i < N; i++) {  A[i] = B[C[i]];  }
A naïve (and incorrect) vectorization of this loop would be:
for (i = 0; i < N; i += SIMD_WIDTH) {  zmm0 = vmovdqu32 &C[i]  k1 = kxnor k1, k1  zmm1 = vgatherdd B, zmm0, k1  vmovdqu &A[i], zmm1}
However, if the compiler generating the vectorized version of the loop has no a priori knowledge about the addresses or alignment of A, B, and C, then the above vectorization is unsafe.