In parallel computing, multiple forms of parallelism may be used, such as bit-level, instruction-level, data-level, and thread-level parallelism. For example, a single-instruction multiple-data (SIMD) architecture can be used to process vectors or matrices, e.g., in image processing. Multiple-instruction multiple data (MIMD) may be another form of data-level parallelism, where processing units may perform different tasks on different subsets of data. Very long instruction word (VLIW) architectures can implement a specialized form of MIMD processing in which multiple parallel operations are bundled together at program compilation for simultaneous execution. Another technique in processor architectures for servers, computers, and personal and mobile devices is instruction-level parallelism (ILP). One widely used form of ILP is pipelining. Another more recent and complementary approach, called superscalar, employs the re-ordering of execution of instructions in a single thread to maximize the use of computational resources. Simultaneous multithreading (SMT) is a form of thread-level parallelism. Here, there are multiple simultaneous threads of execution; if execution of one thread stalls while waiting for operands to become available, execution of another can be immediately invoked. Multiple-core architecture is a form of hardware parallelism, wherein the operating system software allocates execution of a process to one of a plurality of processors.