Modern computing and electronic devices rely on their processors performing massive numbers of operations per second to meet performance expectations. Central processing units (CPUs) and graphics processing units (GPUs) execute based on high frequency clock signals, and typically have multiple cores that each operate on different tasks at the same time. Thus, modern processors rely on multiprocessing and multitasking to perform work. For the processors to be productive with their multiprocessing and multitasking capabilities, computing systems continue to have higher expectations for data throughput and bandwidth. Throughput refers to the frequency of access to embedded memories in a given time, and bandwidth refers to the number of bits that can be read and/or written in a single cycle.
Traditional embedded memories are unable to scale up to the increased performance requirements, due to circuit and design challenges. There are multiport memories that provide multiple ports of access for a single cycle which can improve bandwidth, but traditional multiport capability has a very high area penalty, as well as power consumption penalties. Thus, multiport memories are not generally considered to increase bandwidth and throughput proportional to the penalties. Additionally, there are limitations on how fast a memory device can reliably perform a read and/or a write operation. There are typically multiple circuit elements along a read/write path, each with tolerances for how quickly it can reliably be expected to provide the expected data. Such limitations directly affect device throughput, even if bandwidth scaling can be overcome.
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein.