1. Technical Field
The present invention is directed generally toward a method and apparatus for implementing a self-timed static random-access memory in an integrated circuit.
2. Description of the Related Art
There are two basic types of semiconductor random-access memory (RAM) circuits in common use. Static random-access memory (SRAM) stores data by way of a feedback circuit. Dynamic random-access memory (DRAM) stores data as electrostatic charge on a capacitor. In general, RAM circuits are configured in two-dimensional arrays of individual memory cells, with each memory cell storing one bit. A word of data may be accessed from one or more memory circuits by addressing the cells that store the data by row and column addresses and reading or writing data to or from the addressed cells. In a typical SRAM array, each memory word is stored in a separate row and addressed by asserting a “word line,” while the individual bits of each word are read from and written to the memory array using “bit lines.” In a typical single-port memory array, all bit lines for a particular bit position are connected together. For example, all memory cells representing bit position 4 of a word typically share common bit lines, but have separate word lines. The generic term for word lines and bit lines is “address lines,” as address lines are used for addressing individual memory cells.
Memory circuits may be single-port or multi-port memory circuits. Single-port circuits are capable of allowing access to a single memory location (i.e., one cell or a group of cells at a single memory address). Multi-port circuits allow two or more memory addresses to be accessed concurrently. Specifically, a “port” is a set of related address lines that together are sufficient to perform one memory access at a particular point in time. Thus, a single-port memory cell, which only has one port, is capable of supporting only one access at a time, while a dual-port memory cell, which has two ports, is capable of supporting two simultaneous memory accesses. Higher-order multi-port cells (e.g., three-port, four-port, etc . . . ), which support larger numbers of simultaneous accesses, are also possible.
FIG. 1 is a diagram of a typical six-transistor single-port complementary metal-oxide semiconductor (CMOS) SRAM circuit 100 as known in the art. SRAM circuit 100 is perhaps the most common circuit topology for a single-port SRAM. SRAM circuit 100 includes a flip-flop circuit, which is formed by cross-coupling two logic inverters formed by transistors Q1-Q4, and two pass-gate transistors (also called access transistors) Q5 and Q6.
Specifically, PMOS (p-channel MOS) transistor Q3 and NMOS (n-channel MOS) transistor Q1 form one CMOS inverter and PMOS transistor Q4 and NMOS transistor Q2 form another CMOS inverter. Referring to the inverter formed by transistors Q3 and Q1, the gates of transistors Q3 and Q1 are connected together to form an input node 110 to the inverter. The sources of transistors Q3 and Q1 are connected together to form an output node 112 of the inverter. The drain of transistor Q3 is connected to positive supply rail Vdd 106, making transistor Q3 the “pull-up” transistor of the inverter. The drain of transistor Q1 is connected to negative (or “low”) supply rail Vss 108, making transistor Q1 the “pull-down” transistor of the inverter. Transistors Q4 and Q2 are similarly configured as a CMOS inverter. In SRAM circuit 100, the CMOS inverter formed by transistors Q4 and Q2 is cross-coupled with the CMOS inverter formed by transistors Q3 and Q1. Thus, node 110, which is the input node of the inverter formed by transistors Q3 and Q1, forms the output node of the inverter formed by transistors Q4 and Q2, and node 112, which is the output node of the inverter formed by transistors Q3 and Q1, forms the input node of the inverter formed by transistors Q4 and Q2.
Nodes 110 and 112 are referred to as the “internal nodes” of SRAM circuit 100. For the purposes of this document, the term “internal node” is defined as a data-storing node in an SRAM circuit. In the case of circuit 100, nodes 110 and 112, because they form part of the feedback loop of the cross-connected CMOS inverters (transistors Q1-Q4), are data-storing nodes and are, therefore, “internal nodes,” for the purposes of this document.
Pass-gate transistors Q5 and Q6 are MOS transistors configured as switches. The gates of transistors Q5 and Q6 are connected to word line 102. The source and drain of pass-gate transistor Q5 are connected between bit line 104 and node 112. The source and drain of pass-gate transistor Q6 are connected between inverse bit line 106 and node 110. Pass-gate transistors Q5 and Q6 are turned on when word line 102 is selected (i.e., raised in voltage) and connect bit lines 104 and 106 to the flip-flop formed by transistors Q1-Q4. When pass-gate transistors Q5 and Q6 switch bit lines 104 and 106 into connection with internal nodes 110 and 112, the data stored by memory circuit 100 becomes available on bit line 104, and the complement of that data becomes available on inverse bit line 106, so reading from memory circuit 100 becomes possible. To write data to memory circuit 100, word line 102 is selected, the data to be stored is asserted on bit line 104, and the complement of that data is asserted on inverse bit line 106. Since transistors Q1-Q4 form a bistable circuit (i.e., a circuit with two stable states), asserting the new data on bit lines 104 and 106 results in putting this bistable circuit into the stable state associated with the stored data. When word line 102 is no longer asserted, transistors Q1-Q4 maintain the same stable state, and thus store the written data until power is no longer available from power supply rails 108 and 109.
FIG. 2 is a diagram showing how a typical SRAM memory array 200 is configured from individual memory cells. Memory array 200 is a single-port memory array (i.e., it consists of only single-port memory cells and supports only one memory access at a time), although multi-port memory arrays are also common. In memory array 200, words are arranged in rows, and bit positions are arranged in columns. For instance, word line 202 enables access to all of the bits in the memory word represented by that row, while word line 204 enables access to all of the bits in the succeeding memory word in the memory space provided by memory array 200.
Each column in memory array 200 represents a bit position within a word. Thus, bit line 206 and its complement bit line 208 represent a particular bit position, while bit line 210 and its complement bit line 212 represent the succeeding bit position. Note that all of memory cells corresponding to a particular bit position are connected to the same word lines. Thus, each individual memory cell in memory array 200 is accessed by row and column.
In “system on a chip” (SoC) applications, where a complete system of components is manufactured on a single integrated circuit (IC), SRAM arrays, such as that depicted in FIG. 2, may serve any of a variety of functions. The six-transistor SRAM cell depicted in FIG. 1 (memory circuit 100) is regarded as being the most common SRAM cell currently in use in industry, since the six-transistor SRAM cell is fast and also suitable for high-density applications, where space in the IC layout is at a premium.
Since memory cells are typically implemented in a two-dimensional array, such as that depicted in FIG. 2, there will generally be some form of wire-length-related delay or latency between the time that a word line is strobed for a read operation and the time that the desired data appears on the bit lines at the periphery of the array (where the data can be latched or otherwise used). Self-timed memory circuits are often used to address this problem. In a typical self-timed memory circuit, a self-timed row decoder circuit is located at the top of the memory array so as to mimic the wire delay from the memory's control block (at the bottom of the array) up to the top row decoder of the memory. In the typical case, the self-timed row decoder circuit drives a signal that is allowed to propagate from the top of the array to the bottom of the array, where the sense amplifiers for the memory array are located. In this way, the maximum wire delay experienced by the data signals being read from the memory would be estimated, since the top row of memory cells would have the highest amount of wire delay from the perspective of the sense amplifiers at the bottom of the array.
In some applications programmability, or at least simplicity of the design process, becomes a priority. When rapid turnaround time or ease of manufacturing is needed, a “programmable” IC, which provides a standardized, generic set of components, such as logic gates or memory cells, can be “programmed” to implement the desired functionality. Thus, rather than laying out each individual transistor circuit in the design, a designer can simply make or break connections between the standard, generic components in the IC to achieve the desired result. Many devices that are called “programmable” may be programmed using some sort of programming apparatus, such as an FPGA (field-programmable gate array) programmer device. Another form of “programming” is “metal programming,” in which one or more metal layers in the layout of an IC are used to form connections between standard components. “Metal programming” is useful for implementing IC designs that are to be commercially manufactured. In general, metal programming allows the designer the convenience of designing a circuit using a programmable device as a basis for the design, but “metal programming” is also rather conducive to mass manufacture, as the “programmed” part of the IC can simply be implemented as a layer in the usual fabrication process, rather than by having to “burn” the programmed part into the IC using a special programmer device.
In the design of metal-programmable memories, the self-timed architecture can restrict the number of ways in which the memory can be broken up. FIG. 5 is an example of a memory array design that illustrates this problem. In FIG. 5 a contiguous 512-by-512 array of memory cells with 2 input/output (I/O) blocks 504 and 506 (i.e., with two sets of sense amplifiers and related read/write circuitry) at the top and the bottom is segmented horizontal boundary line at a location of choice between the top and bottom of the array so as to form two adjacent memory arrays 500 and 502. Each of I/O blocks 504 and 506 operates only on its respective part of the original 512-by-512 memory array (i.e., on memory array 500 and memory array 502, respectively). This requires that self-timing row decoders 508 and 510 be located at the boundary line separating row decoder regions 507 and 509, which are the row decoders for memory array 500 and memory array 502, respectively. If a design calls for dividing the memory cell array in many areas by only changing metal routing layers, then self-timed row decoders and associated dummy bit cells (for reading the self-timing signals) must be placed in numerous places. This requires that free layout space be reserved at all of the locations in which a possible breakpoint or boundary line in the memory array can be located. This can use up a tremendous amount of layout area if many different possible breakpoints are desired.
Thus, a need exists for a self-timed memory circuit that allows a memory array to be broken into multiple segments without reserving large portions of layout space within the array for self-timing circuitry.