The invention is related to the field of memory storage and retrieval, and in particular reducing memory accesses for enhanced in-memory parallel operations.
Performing simple operations on elements of an array stored in memory is an important component of many computer applications. One example is search—looking for all elements that match some value according to some criteria. Today such operations require fetching large amounts of data from memory, and transferring it over electrically long distances into the cache hierarchies of modern processor cores, where it is looked at once and thrown away. Furthermore, very often only a small part of the data actually fetched is looked at and the rest ignored. In total, this wastes both time and power.
Over the last few decades a variety of computing systems and chips have been designed with computational units positioned very close to the memory so that such long distance transfers can be avoided and data looked at as soon as it is read from memory. This reduces the need to transfer data, but still suffers from excess data reads that are not always used. Alternatives have “rotated” the components of a data field so that they extend “vertically” into the memory rather than “horizontally” as in common memories. When there are large numbers of such array elements to process this can greatly reduce the wasted memory reads, but complicates the loading and storing of individual items in such data. Also current designs provide no programmability in the configuration of such architectures.