Unless otherwise indicated herein, the materials described in this section are not prior art to the claims of this application and are not admitted to be prior art by inclusion in this section.
Various typos of buses may be bandwidth limited. Instances in which bandwidth is limited can increase the time to transfer data between a processor and memory, or other component, thereby reducing performance of an entire computing system. Bandwidth limitation may also decrease the scalability of computing systems that utilize central processing units (“CPUs”) with more than one processor core (e.g. multi-core processors) when using a shared bus with limited bandwidth to transfer data between the different cores. Bandwidth limitations can also cause visual and graphical imperfections due to as graphical processing unit (“GPU”) being unable to transfer data to and from memory at a sufficient rate to provide an acceptable graphical output. Thus, bandwidth limitations and circuit delays can decrease the performance of a computer.
In one example of a memory read operation illustrating limitations of some conventional memory systems, a core of a processor may send a request to read data from an output buffer or output register of a memory. The memory may decode the requested address and may output the data to its buffer. Then, a memory controller may assert a data ready signal for the processor to read the data on the bus from the buffer on the following edge of a clock. The processor, in response to receiving the data ready signal at a following edge of the clock, may store the data to an input register, or buffer, using edge triggered flip-flops.
In order to receive and process the correct order of data, latches used in a memory read operation typically may attempt to satisfy timing requirements. Some timing requirements may include a setup time, which is a time allotted to increase the probability that the data is stable and ready before the data is read. Another timing requirement may include a hold time, which is a time allotted to the latches to maintain the data so as to increase the probability that data is read before changing to other data. In addition to these timing requirements (which result in timing delays), other delays may be present in a memory circuit. For example, there may be a delay between sensing a clock edge and when a flip-flop outputs its data.
After waiting for these timing parameters (or delays), the processor can read data from its input register. At a data sender section of the memory system, which may include the memory's output register, the data may also be latched using sender flip-flops and may have similar requirements as receiver flip-flops at a data receiver section of the memory system. Thus, these sender flip-flops at the data sender section may be timed in order to meet the setup and hold times, in addition to any delays between a clock edge and output from one or more flip-flops.
The data to be transferred typically may be first latched in one or more sender flip-flops and then latched in one or more receiver flip-flops. The double latching arrangement may take at least two clock cycles in order to increase the likelihood that the data is being sent and received in the same order. When using this arrangement, the clock speed for the bus may need to be made slower so as to account for the total time required for latching data in the flip-flops, including the setup time, hold time, and delay of output. The reduced speed of the clock for the bus may result in degradation of performance.