The present invention relates to first-in-first-out memories (FIFOs), and to analogous types of "smart" memory.
A first-in-first-out memory (FIFO) is a memory which (when the memory is being read) will output data in the same order as the data was originally written in. This functionality is very convenient for many applications, but is not always easy to implement. Some memory devices, such as charge-coupled devices or shift registers, provide an inherently serial hardware structure which can be used to provide FIFO functionality, but even these devices require some sophisticated control logic. Thus, large FIFOs are normally built using a random-access memory architecture, together with "smart" peripheral circuits which control the memory read and write operations to provide the desired functionality. In such an implementation, for example, the control logic can increment a write-address pointer each time a data packet is written, and increment a read-address pointer each time a data packet is read.
Another type of "smart" memory is LIFO memory (also referred to as "stack" memory), where the data is always read out in the opposite order to the order of writing. This architecture too is normally implemented using a random-access memory, with appropriate logic used to increment a write-address pointer each time a data packet is written, and to decrement a read-address pointer each time a data packet is read.
In the presently preferred embodiment, the FIFO architecture is RAM-based. That is, internally, this FIFO architecture uses arrays of memory cells, rather than chip registers or other technology which would require a ripple-through transfer of data from one physical location to another location. However, there are many difficulties in implementing a high-speed FIFO using a RAM-based architecture. The present invention provides a significant improvement in features of such architectures, and therefore makes it easier to configure FIFOs (or other smart memories) using a RAM-based architecture.
The present invention provides a smart memory, in which self-timing controls the timing of the data output drivers, by using dummy row elements to provide the self-timing signals to control the timing in the critical data path.
The "smart" memory architecture of such a FIFO (or other RAM-based sequentialaccess memory) is quite different from conventional architectures for mass-market semiconductor memories, such as SRAMS or DRAMS. More common memory chips will typically have address inputs, and will also have additional control signals. A typical memory chip architecture will have chip enable (CE*) and write enable (WE*) control inputs, and might also have an output enable (OE*) control input. In addition, of course, DRAMS will also have row-address-strobe and column-address-strobe (RAS* and CAS*) inputs. By contrast, this FIFO architecture has far fewer control inputs. Only two control inputs are used during normal operation, namely a W* write control input and and R* read control input.
It is believed that some large FIFOs have used an array of DRAM memory cells, without refresh circuits. Such FIFOs are inherently best suited for video buffers, since their volatility means that data residence time must be stringently limited.
It should also be noted that RAM-based FIFO architectures are significantly different from the architectures of conventional video DRAMs. From a system point of view, the primary function of video RAMs is to provide a random-access memory whose contents can be read out in blocks at very high speed. Thus, the serial access is normally read-only. Thus, this architecture is quite different from that of a FIFO or LIFO, where two address pointers are maintained, one for read access operations and one for write access operations. Moreover, although video RAMs are normally dual-ported externally, they are normally not dual-ported at the cell level: each cell will normally be accessible by only one word line.
Note that Schuster et al., "A 20-nsec 64K (4K.times.16) NMOS RAM," 19 Journal of Solid-State Circuits 564 (1984), describes a memory which includes a dummy row used to define timing relations.
The preferred architecture of the smart memory of the presently preferred embodiment uses a dual-ported array of memory cells, where each cell is connected, through two pass transistor pairs, to two bitline pairs. One of the bitline pairs from each cell is connected to the column logic controlled by the write-access control logic, and the other bitline pair from each cell is connected to the column logic controlled by the read-access control logic. Thus, read-access and write-access are fully independent and asynchronous. The FIFO memory, in the presently preferred embodiment, also provides full, empty, and half-full flags, and unlimited expansion capability in both width and depth. This architecture is generally described in the DS2010 and DS2001 preliminary data sheet, in the 1987 Product Data Book of Dallas Semiconductor Corporation. However, it should be appreciated that the innovative concepts disclosed herein could also be applied to other system contexts, and particularly to other kinds of smart memories. In particular, these innovations may also advantageously be applied to LIFO memories. The disclosed innovative concepts may also, less preferably, be applied to a variety of asynchronous state machines. Less preferably, these concepts can also be applied to other multiport memories generally, to provide asynchronous control of data valid flags.
The present invention provides a RAM-based sequential-access multiport memory, where dummy elements are used to provide a self-timed output. The outputs are not driven active until a time indicated by an internal delay timer. The internal delay timer includes a dummy row line, so that the time constant of the delay line tracks the time constants of the actual integrated circuit elements which provide the memory array.
This has the advantage of providing a time delay, from the time that the read control signal R* goes active (by showing a falling edge), which is minimal for each particular memory chip. That is, as is well known in the art of integrated circuit manufacturing, the normal variation in device parameters during manufacture of integrated circuits will cause some variation in device characteristics. For example, the thickness of polysilicon or metal thin film lines may vary slightly, due to changes in the deposition conditions. The thickness and surface-state charge (Q.sub.SS) of gate oxides or oxynitrides may also vary significantly. A particularly important source of variation is normal linewidth variation during lithography. That is, for a given drawn pattern and target linewidth dimension, the line-to-space ratio of the resulting pattern can easily vary by plus or minus twenty percent or more, depending on the normal variations in the photo resist exposure and development process. These considerations imply that the sheet resistance of thin film conductors may vary, the capacitance of capacitors (such as MOS gates) may vary, and also that the transconductance of MOS transistors may vary. All of these electrical parameter variations (which result from device parameter variations) can lead to a net change in the time constants of various circuits, and therefore to some net change in the delay of the circuit.
By providing an adaptive delay element, the delay until valid data is driven out on the data out lines Q.sub.0 through Q.sub.8 is kept at the minimum level, for each particular integrated circuit, which is consistent with the desired degree of reliability. This has the advantage that the need for external control logic is minimized. This has the further advantage that the effective net speed of every such smart memory chip will be exactly as fast as is possible for that particular chip. Thus, devices can simply be sorted according to their access time. Moreover, this self-timing capability means that the design of systems to interface with such memory chips is made simpler. The system using such a chip does not need to "know" what the access time of the chips being used is. If faster chips are inserted into such a system in place of slower chips, the system will simply run faster (if the system is able to make use of the extra speed).
In the preferred memory architecture, the outputs of the FIFO memory are tristated after every read access. That is, when the R* line is brought low, the output drivers at pins Q.sub.0 through Q.sub.8 are normally kept in a tristated (high-impedance) condition, except when they are carrying valid data. Note that the tristate output capability, in the presently preferred embodiment, contributes directly to the very simple depth expansion capability of FIFOs according to the present invention.
A further innovative feature relates to the way in which successive bytes are mapped into the memory array. The memory array, in the presently preferred embodiment, includes left and right half-arrays. Each of the half-arrays includes nine groups of columns, with each group of columns used to store the data for one bit-position of a byte. Thus, in the presently preferred embodiment, each byte is contained entirely within one of the two half-arrays. (In the presently preferred embodiment, each group of columns (for one bit-position) on each side of the array includes 8 columns, but of course this can be varied in accordance with the size of the memory.) The three least significant bits of the address are used to select one of these eight columns, and the next most significant bit is used to select the left or right half-array. This means that successive read operations (or successive write operations) will "ping-pong" between the two sides of the array: after a write operation has occurred in a row within one half-array, no further write operation will normally occur in any other row of that half-array until a write operation has also occurred within the corresponding row of the other half-array. Note that this relation could also be achieved with a different organization of the address bits: another way to state this relation is that the address bit which determines left/right selection is less significant than any of the bits which define row selection (within a given subarray). Thus, this advantageous relation could also be achieved if the column-select bits were not entirely confined to the least significant address bits, and/or if subarray-select bits were also used (in an embodiment which included multiple subarrays). (Since the read and write operations are independent, the read and write operations are independently ping-ponged in this fashion.)
The "ping-ponging" relation is advantageous, because it lowers the current requirements of the memory array's operations. The various peripheral circuits which are required for the memory hardware (such as precharge operations on the read side, or data set-up on the write side) are replicated for the two half-arrays. This means that these circuits can take advantage of the guaranteed 50% idle time which the serial access to the FIFO's array provides. This is also advantageous, because it allows the address pointer to be updated for the half-array, to save access time. (In the presently preferred embodiment, this is performed on the write side but not on the read side.)
In the presently preferred embodiment, the output pins do not all receive data simultaneously. Instead, data is driven onto three pins, and then onto three more pins (while the first three pins continue to be driven with valid data), and then onto the last three pins. Thus, three stages of edges occur before all nine pins have valid data. This separation of the transition times helps to minimize the electrical noise on the power supply lines. Thus, this separation of the transition times simplifies the output buffer design, since the output buffer is not required to include "despike" circuitry.
In the presently preferred embodiment, this separation of transition times is achieved by the use of multiple tap positions on the dummy row line which provides self-timing for the output data. (The sequence of the transitions is actually dependent on the physical position of the columns of the memory array, so that the sequence will actually be different during accesses to the left and right half-arrays.) Only a small separation in time is needed to provide the desired reduction in electrical characteristics, and this timing arrangement permits the improved electrical characteristics to be achieved without significant degradation in access time: the net access time for the full byte of data will always be determined by the slowest bit. Thus, by the use of this optional innovative feature, the self-timing arrangement provided by the present invention has the further advantage that split timing signals are provided to minimize electrical noise effects of the output buffers, without any significant loss of access speed.
In the art of static random access memories (SRAMs), some architectures use delay elements connected so that the different portions of the memory peripherals are activated only at times when they are needed. For example, in a sample SRAM architecture, a transition detector will detect any change in the input address, and will bring up the word line drivers in time for the word line corresponding to the selected row to be driven high as rapidly as possible. At the same time, or shortly thereafter, the precharge circuitry will be activated, to precharge the bit line pairs of each column of cells to equal potentials. Thus, when the selected word line is driven high to open the pass transistors of the selected cells, each selected cell can begin to develop a signal on its bit line pair as rapidly as possible. With a further delay imposed, the sense amplifiers may be driven active, so that the sense amplifiers will rapidly amplify the signal developed on the bit line pair as soon as that signal begins to be developed. That is, where a dynamic sense amplifier is used, it will typically be brought up to its metastable (and amplifying) state as soon as it can reliably be expected that data will be present on the bit lines. If the sense amplifier becomes active later, the access time of the memory chip is thereby degraded. If the sense amplifier becomes active earlier, there is some risk that the sense amplifier may be triggered by electrical noise on the bit line pair, to produce an incorrect data output. Dummy rows and dummy columns have been used to produce appropriate delays for such self-timed SRAM architectures.