1. Field of the Invention
The present invention relates to a data processing system having a few bus masters and many bus slaves connected in parallel to a common bus. In particular, this invention relates low latency, high bandwidth, low power, high-yield, large capacity memory devices suitable for data processing and video systems. This invention is particularly suitable for systems organized into multiple identical modules in a very-large-scale or wafer-scale integration environment.
2. Description of the Prior Art
When transmitting signals on traditional bus systems, problems typically arise when either of the following conditions exist: (i) the rise or fall time of the transmitted signal is a significant fraction of the bus clock period or (ii) there are reflections on the bus of the signal which interfere with the rising or falling transitions of the signal. The data transfer rate is limited in part by whether signal integrity is compromised as a result of the above conditions. Therefore, to increase data bandwidth, it is desirable to avoid the above-listed conditions.
High frequency data transmission through a bus requires a high rate of electrical charge (Q) transfer on and off the bus to achieve adequate rise and fall times. To avoid condition (i) above, large transistors in the bus drivers are needed to source and sink the large amounts of current required to switch the signal levels. Equation (1) sets forth the relationship between the required current drive capability (I) of the bus drivers, the number of devices (n) attached to the bus, the output capacitance (C) of the bus driver, the signal swing (V) needed to distinguish between logical 1 and 0, and the maximum operating frequency (f) of the bus.
I=nCVfxe2x80x83xe2x80x83Eq(1)
Thus, one way to obtain a higher operating frequency is to increase the drive capability of the bus driver. However, higher drive usually requires a driver with larger size, which in turn translates to increased silicon area, bus capacitance, power consumption and power supply noise. Furthermore, when the output capacitance of the bus driver becomes a substantial part of the bus capacitance, increasing the size of the bus driver does not result in a higher operating frequency.
Another way to increase the operating frequency is to reduce the signal swing on the bus. Signal swing is defined as the difference between the maximum voltage and the minimum voltage of the signals transmitted on the bus. Many traditional bus systems, including the TTL standard, use reduced-swing signal transmission (i.e., signal swing smaller than the supply voltage), to enable high speed operations. A reduced signal swing reduces the required charge transfer, thereby reducing power consumption, noise and required silicon area. Because reduced signal swing substantially reduces the current required from the bus driver, parallel termination of bus lines is facilitated. Parallel termination is an effective way to suppress ringing in the bus. However, the use of small swing signals requires the use of sophisticated amplifiers to receive the signals. As the signal swing decreases, the required gain of the amplifier increases, thereby increasing the required silicon area and operating power. It would therefore be desirable to have a bus system which utilizes small swing signals, but does not require the use of sophisticated amplifiers.
Prior art small swing (less than 1.5 V peak-to-peak) I/O (input/output) schemes generally have a logic threshold voltage different from Vdd/2 (i.e., one-half of the supply voltage), the logic threshold of a conventional CMOS logic circuit. The logic threshold, or trip point, of a bus signal is the voltage level which delineates a logical 1 from a logical 0. An example of such scheme is GTL, where a logic threshold of 0.8 volt is used. (R. Foss et al, IEEE Spectrum October 1992, p.54-57, xe2x80x9cFast interfaces for DRAMsxe2x80x9d). Other small swing I/O schemes, such as center-tap terminated (CTT) Interface (JEDEC Standard, JESD8-4, November, 1993), have a fixed threshold (e.g., 1.5 volts) which does not track with the supply voltage. To use a bus signal having logic threshold other than the CMOS logic threshold in a CMOS integrated circuit, a translator circuit must be used to translate the I/O logic threshold to the conventional CMOS logic threshold. These translators consume circuit real estate and power, introduce additional circuit delay and increase circuit complexity.
CMOS circuitry uses a logic threshold of Vdd/2 to permit the CMOS circuitry to operate with symmetrical noise margins with respect to the power and ground supply voltages. This logic threshold also results in symmetrical inverter output rise and fall times as the pull-up and pull-down drive capabilities are set to be approximately equal.
Traditional DRAM devices (IC""s) are organized into arrays having relatively small capacities. For example, most commercial 1M bit and 4M bit DRAM devices have an array size of 256K bit. This organization is dictated by the bit-line sense voltage and word line (RAS) access time. However, all arrays inside a DRAM device share a common address decoding circuit. The arrays in DRAM devices are not organized as memory modules connected in parallel to a common bus. Furthermore, each memory access requires the activation of a substantial number (e.g., one quarter to one half) of the total number of arrays, even though most of the activated arrays are not accessed. As a result, power is wasted and the soft-error rate due to supply noise is increased.
Prior art DRAM schemes, such as Synchronous DRAM (JEDEC Standard, Configurations For Solid State Memories, No. 21-C, Release 4, November 1993) and Rambus DRAM (See, PCT Patent document PCT/US91/02590) have attempted to organize the memory devices into banks. In the synchronous DRAM scheme, the JEDEC Standard allows only one bit for each bank address, thereby implying that only two banks are allowed per memory device. If traditional DRAM constraints on the design are assumed, the banks are formed by multiple memory arrays. The Rambus DRAM scheme has a two bank organization in which each bank is formed by multiple memory arrays. In both schemes, due to the large size of the banks, bank-level redundancy is not possible. Furthermore, power dissipation in devices built with either scheme is at best equal to traditional DRAM devices. Additionally, because of the previously defined limitations, neither the Synchronous DRAM scheme nor the Rambus DRAM scheme uses a modular bank architecture in which the banks are connected in parallel to a common internal bus.
Many prior art memory systems use circuit-module architecture in which the memory arrays are organized into modules and the modules are connected together with either serial buses or dedicated lines. (See, PCT patent document PCT/GB86/00401, M. Brent, xe2x80x9cControl System For Chained Circuit Modulesxe2x80x9d [serial buses]; and xe2x80x9cK. Yamashita, S. Ikehara, M. Nagashima, and T. Tatematsu, xe2x80x9cEvaluation of Defect-Tolerance Scheme in a 600M-bit Wafer-Scale Memoryxe2x80x9d, Proceedings on International Conference on Wafer Scale Integration, January 1991, pp. 12-18. [dedicated lines]). In neither case are the circuit modules connected in parallel to a common bus.
Prior art memory devices having a high I/O data bandwidth typically use several memory arrays simultaneously to handle the high bandwidth requirement. This is because the individual memory arrays in these devices have a much lower bandwidth capability than the I/O requirement. Examples of such prior art schemes include those described by K. Dosaka et al, xe2x80x9cA 100-MHz 4-Mb Cache DRAM with Fast Copy-Back Schemexe2x80x9d, IEEE Journal of Solid-State Circuits, Vol. 27, No. 11, November 1992, pp. 1534-1539; and M. Farmwald et al, PCT Patent document PCT/US91/02590.
Traditional memory devices can operate either synchronously or asynchronously, but not both. Synchronous memories are usually used in systems requiring a high data rate. To meet the high data rate requirement, synchronous memory devices are usually heavily pipelined. (See, e.g., the scheme described in xe2x80x9c250 Mbyte/s Synchronous DRAM Using a 3-Stage-Pipelined Architecturexe2x80x9d, Y. Takai et al, IEEE JSSC, vol. 29, no. 4, April, 1994, pp. 426-431.) The pipelined architecture disclosed in Y. Takai et al, causes the access latency to be fixed at 3 clock cycles at all clock frequencies, thereby making this synchronous memory device unsuitable for systems using lower clock frequencies. For example, when operating at 50 Mhz the device has an access latency of 60 ns (compared to an access latency of 24 ns when operating at 125 Mhz).
Conventional asynchronous memory devices, due to the lack of a pipeline register, maintain a fixed access latency at all operating frequencies. However, the access cycle time can seldom be substantially smaller than the access latency. Consequently, asynchronous devices are unsuitable for high data rate applications.
Thus, it would be desirable to have a memory device which provides a high through-put, low latency, high noise immunity, I/O scheme which has a symmetrical swing around one half of the supply voltage.
It would also be desirable to have a memory device which can be accessed both synchronously and asynchronously using the same set of connection pins.
Moreover, it would be desirable to have a memory device which provides a high data bandwidth and a short access time.
It would also be desirable to have a memory device which is organized into small memory arrays, wherein only one array is activated for each normal memory access, whereby the memory device has low power dissipation.
Additionally, it would be desirable to have a memory device having small functionally independent modules, a defective module can be disabled and another module is used to replace the defective module, resulting in a memory device having a high defect tolerance.
It would also be desirable to have a memory device in which a single input data stream can be simultaneously written to multiple memory arrays and in which data streams from multiple memory arrays can be multiplexed to form a single output data stream.
Furthermore, it would be desirable to have a memory device in which many memory modules are attached to a high-speed common bus without the necessity of large bus drivers and complex bus receivers in the modules.
The present invention implements a compact, high speed reduced CMOS swing I/O scheme which uses Vdd/2 as the logic threshold. This scheme has the following advantages: (i) The logic threshold tracks with supply voltages, thereby maintaining balance of pull-up and pull-down. (ii) The bus driver and receiver circuits work at a very wide range of supply voltages without sacrificing noise immunity, since the thresholds of the bus driver and receiver circuits track with each other automatically. (iii) The logic threshold is implicit in the logic circuit and does not require an explicit reference generator circuit. (iv) Logic threshold translation is not necessary since the I/O logic threshold is identical to that of the other logic circuitry on-chip.
The present invention groups at least two memory arrays or banks into a memory module and connects all the memory modules in parallel to a common high-speed, directional asymmetrical signal swing (DASS) bus, thereby forming a memory device. The memory modules transmit signals having a reduced swing to a master module coupled to the DASS bus. In one embodiment, this reduced swing is equal to approximately one volt about a center voltage of Vdd/2, where Vdd/2 is the threshold voltage of CMOS circuitry. The signal transmitted from the master device to the memory modules has a full Vdd swing.
The memory modules are equipped with independent address and command decoders so that they function as independent units, each with their own base address. This circuit-module architecture has several advantages: (i) it allows each memory module to be able to replace any other memory module thereby increasing the defect tolerance of the memory device. (ii) It significantly reduces power consumption of the memory device when compared to traditional memory devices because each memory access is handled completely by one memory module only with only one of the arrays activated. (iii) Since each memory module is a complete functional unit, the memory module architectures allows parallel accesses and multiple memory module operations to be performed within different memory modules, thereby increasing the performance of the memory device. (iv) The memory module architecture allows the memory device to handle multiple memory accesses at the same time.
The circuit-module architecture of the present invention further allows easy system expansion by connecting multiple memory devices in parallel through a common I/O bus which is an extension of the on chip bus. In addition, by incorporating redundant memory modules on each memory device and allowing each memory module to have a programmable communication address on the I/O bus system, the resulting memory system has defect tolerance capability which is better than each individual memory device.
In one embodiment of the present invention, the memory arrays include redundant rows and columns. Circuitry is provided within the memory modules to support the testing of these and redundant rows and columns. Circuitry is also provided to replace defective rows and columns with the redundant rows and columns during operation of the memory device.
The memory devices in accordance with the present invention are able to span address spaces which are not contiguous by controlling the communication addresses of the memory modules. Furthermore, the address space spanned by the memory devices can be dynamically modified both in location and size. This is made possible by the incorporation, in each memory module, of a programmable identification (ID) register which contains the base address of the memory module and a mechanism which decommissions the module from acting on certain memory access commands from the bus. The present invention therefore provides for a memory device with dynamically reconfigurable address space. Dynamically reconfigurable address space is especially useful in virtual memory systems in which a very large logical address space is provided to user programs and the logical address occupied by the programs are dynamically mapped to a much smaller physical memory space during program execution.
Each memory array in the present design is equipped with its own row and column address decoders and a special address sequencer which automatically increments address of the column to be accessed. Each memory array has data amplifiers which amplify the signals read from the memory array before the signals are transmitted to the lines of the DASS bus. Both the address sequencer and data amplifiers increase the signal bandwidth of the memory array. Consequently, each memory array is capable of handling the I/O data bandwidth requirement by itself. This capability makes multiple bank operations such as broadcast-write and interleaved-access possible. For example, a memory device in accordance with the present invention is able to handle a broadcast-write bandwidth of over 36 gigabytes per second and 36 memory operations simultaneously.
Memory devices in accordance with the present invention can be accessed both synchronously and asynchronously using the same set of connection pins. This is achieved using the following techniques: (i) using a self-timed control in connection with the previously described circuit-module architecture. (ii) connecting memory modules in parallel to an on-chip bus which uses source synchronous clocking. (iii) using half clock-cycle (single clock-transition) command protocol. (iv) using an on-chip resynchronization technique. This results in memory devices that have short access latency (about 10 ns), and high data bandwidth (1 gigabyte/sec).
Another embodiment of the present invention provides for the termination of bus lines. In one embodiment, a passive clamp for a bus line is created by connecting a first resistor between the bus line and a first supply voltage and connecting a second resistor between the bus line and a second supply voltage. In one embodiment, the first supply voltage is Vdd, the second supply voltage is ground, and the first and second resistor have the same resistance.
In an alternate embodiment, an active clamp for a bus line is created by connecting a p-channel transistor between the bus line and a first supply voltage and connecting an n-channel transistor between the bus line and a second supply voltage. The gates of the p-channel and n-channel transistors are driven in response to the bus line.