1. Field of the Invention
The present invention relates to a data processing system having a few bus masters and many bus slaves connected in parallel to a common bus. In particular, this invention relates low latency, high bandwidth, low power, high-yield, large capacity memory devices suitable for data processing and video systems. This invention is particularly suitable for systems organized into multiple identical modules in a very-large-scale or wafer-scale integration environment.
2. Description of the Prior Art
When transmitting signals on traditional bus systems, problems typically arise when either of the following conditions exist: (i) the rise or fall time of the transmitted signal is a significant fraction of the bus clock period or (ii) there are reflections on the bus of the signal which interfere with the rising or falling transitions of the signal. The data transfer rate is limited in part by whether signal integrity is compromised as a result of the above conditions. Therefore, to increase data bandwidth, it is desirable to avoid the above-listed conditions.
High frequency data transmission through a bus requires a high rate of electrical charge (Q) transfer on and off the bus to achieve adequate rise and fall times. To avoid condition (i) above, large transistors in the bus drivers are needed to source and sink the large amounts of current required to switch the signal levels. Equation (1) sets forth the relationship between the required current drive capability (I) of the bus drivers, the number of devices (n) attached to the bus, the output capacitance (C) of the bus driver, the signal swing (V) needed to distinguish between logical 1 and 0, and the maximum operating frequency (f) of the bus. EQU I=nCVf Eq(1)
Thus, one way to obtain a higher operating frequency is to increase the drive capability of the bus driver. However, higher drive usually requires a driver with larger size, which in turn translates to increased silicon area, bus capacitance, power consumption and power supply noise. Furthermore, when the output capacitance of the bus driver becomes a substantial part of the bus capacitance, increasing the size of the bus driver does not result in a higher operating frequency.
Another way to increase the operating frequency is to reduce the signal swing on the bus. Signal swing is defined as the difference between the maximum voltage and the minimum voltage of the signals transmitted on the bus. Many traditional bus systems, including the TTL standard, use reduced-swing signal transmission (i.e., signal swing smaller than the supply voltage), to enable high speed operations. A reduced signal swing reduces the required charge transfer, thereby reducing power consumption, noise and required silicon area. Because reduced signal swing substantially reduces the current required from the bus driver, parallel termination of bus lines is facilitated. Parallel termination is an effective way to suppress ringing in the bus. However, the use of small swing signals requires the use of sophisticated amplifiers to receive the signals. As the signal swing decreases, the required gain of the amplifier increases, thereby increasing the required silicon area and operating power. It would therefore be desirable to have a bus system which utilizes small swing signals, but does not require the use of sophisticated amplifiers.
Prior art small swing (less than 1.5 V peak-to-peak) I/O (input/output) schemes generally have a logic threshold voltage different from V.sub.dd /2 (i.e., one-half of the supply voltage), the logic threshold of a conventional CMOS logic circuit. The logic threshold, or trip point, of a bus signal is the voltage level which delineates a logical 1 from a logical 0. An example of such scheme is GTL, where a logic threshold of 0.8 volt is used. (R. Foss et al, IEEE Spectrum Oct. 1992, p. 54-57, "Fast interfaces for DRAMs"). Other small swing I/O schemes, such as center-tap terminated (CTT) Interface (JEDEC Standard, JESD8-4, November, 1993), have a fixed threshold (e.g., 1.5 volts) which does not track with the supply voltage., To use a bus signal having logic threshold other than the CMOS logic threshold in a CMOS integrated circuit, a translator circuit must be used to translate the I/O logic threshold to the conventional CMOS logic threshold. These translators consume circuit real estate and power, introduce additional circuit delay and increase circuit complexity.
CMOS circuitry uses a logic threshold of V.sub.dd /2 to permit the CMOS circuitry to operate with symmetrical noise margins with respect to the power and ground supply voltages. This logic threshold also results in symmetrical inverter output rise and fall times as the pull-up and pull-down drive capabilities are set to be approximately equal.
Traditional DRAM devices (IC's) are organized into arrays having relatively small capacities. For example, most commercial 1M bit and 4M bit DRAM devices have an array size of 256K bit. This organization is dictated by the bit-line sense voltage and word line (RAS) access time. However, all arrays inside a DRAM device share a common address decoding circuit. The arrays in DRAM devices are not organized as memory modules connected in parallel to a common bus. Furthermore, each memory access requires the activation of a substantial number (e.g., one quarter to one half) of the total number of arrays, even though most of the activated arrays are not accessed. As a result, power is wasted and the soft-error rate due to supply noise is increased.
Prior art DRAM schemes, such as Synchronous DRAM (JEDEC Standard, Configurations For Solid State Memories, No. 21-C, Release 4, November 1993) and Rambus DRAM (See, PCT Patent document PCT/US91/02590) have attempted to organize the memory devices into banks. In the synchronous DRAM scheme, the JEDEC Standard allows only one bit for each bank address, thereby implying that only two banks are allowed per memory device. If traditional DRAM constraints on the design are assumed, the banks are formed by multiple memory arrays. The Rambus DRAM scheme has a two bank organization in which each bank is formed by multiple memory arrays. In both schemes, due to the large size of the banks, bank-level redundancy is not possible. Furthermore, power dissipation in devices built with either scheme is at best equal to traditional DRAM devices. Additionally, because of the previously defined limitations, neither the Synchronous DRAM scheme nor the Rambus DRAM scheme uses a modular bank architecture in which the banks are connected in parallel to a common internal bus.
Many prior art memory systems use circuit-module architecture in which the memory arrays are organized into modules and the modules are connected together with either serial buses or dedicated lines. (See, PCT patent document PCT/GB86/00401, M. Brent, "Control System For Chained Circuit Modules", [serial buses]; and K. Yamashita, S. Ikehara, M. Nagashima, and T. Tatematsu, "Evaluation of Defect-Tolerance Scheme in a 600M-bit Wafer-Scale Memory", Proceedings on International Conference on Wafer Scale Integration, January 1991, pp. 12-18. [dedicated lines]). In neither case are the circuit modules connected in parallel to a common bus.
Prior art memory devices having a high I/O data bandwidth typically use several memory arrays simultaneously to handle the high bandwidth requirement. This is because the individual memory arrays in these devices have a much lower bandwidth capability than the I/O requirement. Examples of such prior art schemes include those described by K. Dosaka et al, "A 100-MHz 4-Mb Cache DRAM with Fast Copy-Back Scheme", IEEE Journal of Solid-State Circuits, Vol. 27, No. 11, November 1992, pp. 1534-1539; and M. Farmwald et al, PCT Patent document PCT/US91/02590.
Traditional memory devices can operate either synchronously or asynchronously, but not both. Synchronous memories are usually used in systems requiring a high data rate. To meet the high data rate requirement, synchronous memory devices are usually heavily pipelined. (See, e.g., the scheme described in "250 Mbyte/s Synchronous DRAM Using a 3-Stage-Pipelined Architecture", Y. Takai et al, IEEE JSSC, vol. 29, no. 4, April, 1994, pp. 426-431.) The pipelined architecture disclosed in Y. Takai et al, causes the access latency to be fixed at 3 clock cycles at all clock frequencies, thereby making this synchronous memory device unsuitable for systems using lower clock frequencies. For example, when operating at 50 Mhz the device has an access latency of 60 ns (compared to an access latency of 24 ns when operating at 125 Mhz).
Conventional asynchronous memory devices, due to the lack of a pipeline register, maintain a fixed access latency at all operating frequencies. However, the access cycle time can seldom be substantially smaller than the access latency. Consequently, asynchronous devices are unsuitable for high data rate applications.
Thus, it would be desirable to have a memory device which provides a high through-put, low latency, high noise immunity, I/O scheme which has a symmetrical swing around one half of the supply voltage.
It would also be desirable to have a memory device which can be accessed both synchronously and asynchronously using the same set of connection pins.
Moreover, it would be desirable to have a memory device which provides a high data bandwidth and a short access time.
It would also be desirable to have a memory device which is organized into small memory arrays, wherein only one array is activated for each normal memory access, whereby the memory device has low power dissipation.
Additionally, it would be desirable to have a memory device having small functionally independent modules, a defective module can be disabled and another module is used to replace the defective module, resulting in a memory device having a high defect tolerance.
It would also be desirable to have a memory device in which a single input data stream can be simultaneously written to multiple memory arrays and in which data streams from multiple memory arrays can be multiplexed to form a single output data stream.
Furthermore, it would be desirable to have a memory device in which many memory modules are attached to a high-speed common bus without the necessity of large bus drivers and complex bus receivers in the modules.