Single instruction multiple data (SIMD) processors execute a single instruction on multiple pieces of data simultaneously. SIMD processors may comprise a plurality of computational units (CU) that receive a piece of the data and executes the instruction on the data.
FIG. 1 depicts a prior art SIMD processor and memory. The SIMD processor 100 comprises a plurality of computational units (CUs) 102a-l (referred to collectively as CUs 102) and a memory access control unit 104 that loads data into, and writes data out of, each of the CUs 102. As depicted, each CU operates on n-bits of data. The memory access control unit 104 can retrieve and/or store data into a vector memory 106. The vector memory 106 is capable of providing a vector block of data sufficient to load data into each of the CUs 102. As depicted, there are 12 CUs and as such, the vector memory can provide a vector of n×12 bits of data for a single memory address. The memory access control unit 104 can receive an address (Rn) 108 for either loading data from the vector memory or storing data to the vector memory. The memory access control unit 104 can load data into each of the CUs from the vector memory in a single read or write cycle. However, if different CUs 102 require data from different address locations, the data must be loaded from the vector memory in subsequent read/write cycles. As such, the memory access control unit 104 may provide a relatively simple implementation; however, the performance of providing data from separate addresses to different CUs 102 is low.
FIG. 2 depicts a further prior art SIMD processor and memory. The SIMD processor 200 is similar to the SIMD processor 100 and comprises a plurality of CUs 102 and a vector memory 206. However, rather than a single memory access control 104, the SIMD processor 200 comprises a plurality of memory access control units 204a-l (referred to collectively as memory control units 204). Each of the memory access control units 204 can receive an address 108 and retrieve and/or store data from the address. The memory access control units 204 may apply a respective offset to the base address (Rn) 108 in order to determine a respective address to load the data from or to. Unlike the vector memory 106, which could provide a vector of n×12 bits of data for a single address, the vector memory 206 is arranged to provide n bits of data, sufficient to load from or into a single CU, for a single address.
The memory access control units 100, 200 can provide data to individual CUs. However, the memory access control unit 100 may provide poor performance when different CUs require data to be loaded from different memory addresses. The memory access control units 200 can provide better performance, even if different CUs require data from different memory addresses; however, the individual memory access control units 204 increase the complexity of the processor 200, and increases the footprint required to provide the memory access control units 204.
It is desirable to have a processor capable of loading data into CUs while mitigating one or more problems associated with previous processors.