Field
The disclosed embodiments relate generally to computer systems, and in particular to computation memory operations in memory management.
Background Art
Computer systems of various types are ubiquitous in modern society. Common to these computer systems is the storage of data in memory, from which processors perform read, write and other access instructions. Considerable portions of resources in computer systems are employed with the execution of these instructions.
Computer systems typically use processors, where the term “processor” generically refers to anything that accesses memory in a computing system. Processors typically load and store data to/from memory by issuing addresses and control commands on a per-data-item basis. Here a data item may be a byte, a word, a cache line, or the like, as the particular situation requires. These data accesses require a separate address and one or more commands to be transmitted from the processor to memory for each access even though the sequence of accesses follows a pre-defined pattern, such as a sequential stream. In some memory technologies, such as DRAM (dynamic random access memory), multiple commands may be required for some or all of the desired access.
The transmission of the memory addresses and associated commands consumes power and may introduce performance overheads in cases where the address/command bandwidth becomes a bottleneck. Furthermore, issuing addresses and control commands on a per-data-item basis may limit opportunities to optimize memory accesses and data transfers.
Existing solutions use the logic layer to implement interconnect networks, built-in-self-test, and memory scheduling logic. It does not appear that support for reduction computations in the logic layer has been proposed or disclosed. Past proposals to implement additional logic directly in the memory are expensive and have not proven to be practical because the placement of logic in a memory chip incurs significant costs in the memory chips, and the performance is limited due to the inferior performance characteristics of the transistors used in memory manufacturing processes. Existing solutions rely on logic and functionality implemented directly in the memory chip with the disadvantages described above, or are implemented on an external chip (e.g., a memory controller on a CPU/GPU chip), which requires special logic and support on the CPU/GPU/memory controller and requires additional data transfers between the CPU/GPU and memory.