In-memory computation is a computation method in which all data is loaded into memory. Loading all data into memory avoids that the data is written into or read from a hard disk, so that a processing rate of a chip is increased.
In-memory computation requires a relatively large memory capacity and relatively large bandwidth, and therefore requires that a large quantity of memory modules be connected to a processor. If each memory module is directly connected to the processor, bandwidth that can be used by each memory module is only 1/N of total bandwidth (assuming that N memory modules are directly connected to the processor). If a plurality of memory modules are used as a memory module set and are directly connected to the processor by using one memory module in the memory module set, bandwidth that can be used by each memory module set is relatively large; however an average hop count of accessing the memory modules by the processor increases, and therefore rates at which the processor accesses the memory modules decrease.
Therefore, how to integrate more memory modules on the chip and ensure high memory bandwidth and a relatively short access latency is a problem that urgently needs to be resolved.