The present invention relates generally to accessing a memory by a processor, and more specifically, to accessing a block of data in a memory atomically or block concurrently by a processor.
Scalar code expects that a central processing unit (CPU) executing the code will access all of the bytes of a software variable together. In a typical architecture for CPUs, such expectation is met for a scalar code as long as the access is performed on a boundary in a memory that is an integer multiple of the size of the data being accessed. When a scalar code is vectorized by a compiler, the load and store instructions are often converted to vector load and store instructions. However, a vector load instruction and a vector store instruction often have no consistency guarantees, or consistency is guaranteed only if the vector load or store instruction is on a boundary that is the size of the vector register in a CPU. For accesses which are not atomic or block concurrent, if one CPU writes out data and another CPU reads data at the same time, the CPU reading the data may see partial updates to the memory locations containing the latter CPU's variables. This is not consistent with the semantics of most programming languages, or programming techniques such as lock-free data structures.