An SRAM compiler is a computer program which can synthesize different memory configurations. Variables which determine a specific memory configuration are word width, number of words, and number of memory blocks. An SRAM compiler program creates a netlist of the memory, simulates the worst case delay path through the memory to generate timing information, builds a symbol of the SRAM for placement in a schematic, builds a simulation model with the timing information, creates a physical layout of the SRAM, builds a routing abstraction of the SRAM, and creates a power grid structure for the SRAM. In general, an SRAM compiler is used to generate memories for application specific integrated circuits (ASICs) such as a gate array or a standard cell circuit. A compiled SRAM may be one of many components which make up an integrated circuit.
An SRAM generated by a compiler contrasts greatly with an SRAM designed for the general marketplace as a stand alone part. Typically, general product SRAMs are full custom designs which focus on memory density, speed, power, yield, and package size. All of these constraints must be met within a short design cycle time to introduce a successful SRAM in a highly competitive market.
Creating an SRAM compiler is a task which involves both design and computer resources. Memory sizes and word widths on an ASIC can vary drastically depending on the customer application. Initial efforts attempted to take existing full custom memory designs and build them into memory compilers. Writing the computer code to create a configurable memory from a full custom design proved to be an extremely difficult task. Most abandoned this approach and created new memory designs which simplified writing the code to synthesize various memory configurations and reduce the complexity of building the physical layout of the SRAM.
Two features are typical of most SRAM compiler designs. First, the compiler builds a single block of memory for the application. Second, decoding stages are designed to minimize layout changes which reduces the complexity of the physical layout compiler. For large memory sizes both of these standard compiler attributes reduce SRAM performance. Larger memory array sizes increase loadings on outputs of decoder circuits and memory cells, increasing SRAM access times. Building the decoding circuits to simplify layout changes often compromises performance for high row/column counts.
In general, large amounts of memory are required in complex integrated circuit designs. The memory required typically takes the form of a large single SRAM or multiple smaller SRAMs.
The main limitation in an SRAM generated by a computer (compiled SRAM) is performance. The memory size typically is dictated by the largest SRAM that can be formed that meets the system speed requirements of an integrated circuit. Presently, compilers have been unable to generate SRAMs approaching the speed/density of the full custom SRAM designs.
In the past, techniques for defining subsets of a memory have included compiling a circuit macro by using a set of predetermined circuit blocks. According to such techniques, as illustrated in FIG. 1, in the physical layout of an SRAM, a central block structure may be formed that includes control circuits, word/bit decoder circuits, and a required number of wordline selector circuits. Storage block macros may be formed that include an array of memory cells grown to the required size in the wordline direction, bit selectors, write drivers, sense amplifier circuits, and output latches. The final SRAM may be formed by placing in the bitline direction, the required number of storage block macros, around the central block structure. Multiple bit selector circuits may permit some performance versus area optimization by varying the aspect ratio of the number of wordlines versus the number of bitlines. This technique provides minimal performance optimization, and does not include a technique for optimizing performance over a range of SRAM sizes.
The above-described technique may be enhanced for compiling SRAMs with multiple read and write ports. According to the enhanced technique, a silicon compiler approach may be utilized to generate the various layouts for the required subcircuits. The compiler calculates load capacitance and increases buffer transistor sizes, based on technology information, to maintain a pro-programmed signal rise/fall time. Once all buffer sizes are known, the compiler then generates the transistor layout and wiring.
While the idea of addressing a subset of a memory array may be known, the techniques do not optimizing performance over an entire range of subset sizes. Additionally, known techniques do not disclose how to control critical timing of the SRAM and, thus, the actual technique for determining performance versus the size of the SRAM.
One technique for timing control employs what are commonly called "dummy wordline" and "dummy bitline" control signals. These control signals attempt to mimic the time required to generate enough signal on the bitline so that the sense amplifier operates correctly, as well as to assure the SRAM cycles at a functional rate.
According to this technique, an extra wordline decoder circuit may be selected every cycle to drive a "dummy wordline" net. The "dummy wordline" is constructed of "dummy cells", such that the signal net delay is proportionally equal to the net delay of a selected wordline. The "dummy wordline" net in turn drives the gate of a "dummy device", which discharges a "dummy bitline" net.
The "dummy bitline" is constructed of "dummy cells" similar to the "dummy wordline", such that it is also proportionally equal to the load of a real bitline. This discharging "dummy bitline" net, which is used as a mimic of the signal generation delay, is then used to control the sense amplifier set clock. In a compilable SRAM, such as that illustrated in FIG. 2, the "dummy wordline" and "dummy bitline" nets grow as the SRAM is grown. Thus, in theory, their net delays increase versus the size of the RAM.
In a design where a "dummy wordline" and a "dummy bitline" are 100% physically matched to the "real wordline" and "real bitline" structures, the slope of their delays, as a function of array size, will be parallel, as illustrated in FIG. 3. This approach has the advantage of the sense amplifier set signal delay tracking the bitline signal delay versus size, temperature, process, and voltage. However, a large performance penalty is taken because a dual sense scheme is required to sense both the dummy bitline, which in turn initiates the sense of real data.
An improved approach, is to scale the dummy structures such that they are faster than the real wordline & bitline delays. In this case, as illustrated in FIG. 4, the "dummy" delays are set so that the maximum size RAM has the minimum required signal margin, which eliminates the delay penalty. However, this structure no longer has a delay slope parallel to the real wordline/bitline delays. Thus, a large performance penalty results with smaller RAMs. Also, since the physical structures are no longer matched, the delay tracking versus process, temperature, and supply voltage is greatly degraded.
This design technique has proven to be somewhat effective in compilable SRAMs up to now. It will track process, voltage, and temperature such that signal margin can be guaranteed, while maintaining functionality across the full operational range. However, each variation of this approach results in some overall loss of RAM performance.