Presently, high performance systems rely on advanced technology, higher processor clock rates and higher data rates to and from these processors. One area of issue is the external memory interface, which is an intermediary device between the system, such as a processor, and a memory, such as static random access memory (SRAM). There are different types of SRAM available, each type adhering to a particular set of operating protocols. For example, there is double data rate (DDR) SRAM and quad data rate (QDR) SRAM, both requiring different signaling protocols. For high performance SRAM memories being clocked at high frequencies, it is important that the memory interface can provide the appropriate timing to ensure robust and high speed operation of the memory.
One such high performance SRAM is the second generation quad data rate (QDR2) SRAM. Accordingly, a corresponding QDR2 SRAM interface can be used to adapt processor control signals for the QDR2 protocols. However, interfacing to external memory at high clock rates poses a large challenge due to the reduced clock cycle and various system timing skews, such as voltage and temperature variation, crosstalk, Vref variations, simultaneous-switching output (SSO) noise and simultaneous-switching input (SSI) noise, and clock source jitter. Together, this causes a narrow data window to pass between the host device, such as the processor integrated circuit (IC), and the memory, such as SRAM.
QDR2 SRAM interfacing further reduces the window by approximately 50% since the data is clocked on both edges of the system clock, providing twice as much data compared to regular SRAM devices. QDR2 SRAMs are currently being used at a clock rate of 333 MHz, which is a data rate of 666 Mbps. At these rates, the maximum ideal data eye is 1.5 ns, leaving a maximum ideal allowance of 750 ps for setup time (tSU) requirements and 750 ps for hold time (tHD) requirements. After applying the various system timing offsets to these allowances, standard design techniques are proven to not provide a feasible solution to interfacing with these memories at these data rates. Custom techniques may provide a solution, however these solutions are complex to implement, do not fit into ASIC design flows, are not readily re-usable, and require large efforts to port between foundries or process nodes.
Previous approaches have been taken to design and implement high-speed QDR2 SRAM memory interface circuits for higher data rates to and from the processors. One such approach is to fully-synthesize the memory interface. Using this approach, synthesizeable design code, typically in an HDL (high-level design language) format (i.e. Verilog), is provided as the solution. The memory interface circuit is developed by using a standard cell library to synthesize this design code, and map it to a technology-specific set of logic gates. This code is then mated to the corresponding I/O buffer cells. The advantage of the fully-synthesizeable approach is ease of implementation at lower data rates and foundry/process node portability. This is mainly due to the fact that standard cell libraries are conservatively configured for robust operation, but only at lower operating frequencies. The disadvantage of this approach is the inability to meet high data rates. As an example, solutions such as these typically achieve approximately 166 MHz, but fail to operate properly between 200 MHz-250 MHz, even after a high degree of manual intervention, which is not easy to implement. Therefore, desired high performance clock rates, such as 333 MHz cannot be easily achieved.
There are three primary issues that must be considered for a synthesized QDR2 SRAM memory interface design. First is minimizing data skew, the second is clock generation, and third is testing. Following is a more detailed discussion of each of these issues.
High data rates require low pin-to-pin skews, which require matched data paths. This is difficult to achieve with a synthesizeable solution due to the pseudo-random nature of synthesis and optimization tools, and of placement and routing tools. In otherwords, circuit element layout and signal line routing cannot be precisely controlled.
Clock generation can be a particularly difficult issue. The protocol of QDR2 SRAM interfacing requires source clocks to be centered in sent data eyes and translation of echo clocks into received data eyes. The PVT (Process/Voltage/Temperature) sensitive nature of standard cell elements causes these clock/data relationships to be unreliable at high data rates. Some systems require an external phase-locked loop (PLL) to generate a clock with a frequency twice that of the memory interface. This requires a wide distribution of 2× frequency clocks, which complicates chip implementation and verification. Some synthesized solutions rely on the falling edge of the system clock, which introduces duty cycle problems into the design, further reducing the achievable data rate.
Circuit testability is an important feature to validate the operation of high performance systems. Secondary tools within the ASIC flows for tasks such as DFT (design for test) and boundary scan insertion, tend to add to the problems of pseudo-random gates, placements, and routing. Hence, the pseudo-random nature of the resultant interface adds tremendous overhead to product test and debug tasks since every pin of the interface has the potential for a differing response to PVT variations, electrical noise, and source clock uncertainties.
Another possible approach includes solidifying, or manually designing, a portion of the circuitry close to the I/O buffer, which directly impacts the system timing performance. This technique can be considered a partially synthesized approach. The advantage of this technique is the removal of the pseudo-random nature of ASIC tool outputs (synthesis/placement/optimization/DFT/routing) from the interface to provide a much more controlled pin-to-pin skew. However, several issues still remain, as is described below.
There is a continued reliance on standard design methods for clock translation into the data eyes, as there is a reliance on a differential system clock with a requirement for a highly controlled phase difference. If widely distributed, there is a continued requirement for high control over the phase difference of the differential clock. Accordingly, implementation complexity remains high.