1. Field of the Invention
The present invention generally relates to wafer-scale circuit integration, in particular to a wafer-scale integrated circuit system comprising data processing elements partitioned into modules, a parallel high-speed hierarchical bus, and one or more bus masters which control the bus operation, bus and a bus interface thereof.
2. Description of the Prior Art
Wafer-scale integration provides more transistors in a single large chip, which allows more functions to be integrated in a small printed circuit board area. Systems built with wafer-scale integration therefore have higher performance, higher reliability and lower cost.
The major barrier to a successful wafer-scale system has been defects inherent in the fabrication process which may render a substantial part of or the whole system nonfunctional. Therefore, it is important to have an effective defect tolerant scheme which allows the overall system to function despite failure of some of its functional blocks. One effective way to manage defects is to partition the wafer-scale system into identical small blocks so that defective blocks can be eliminated. The area of each block is usually made small so that the overall block yield is high. If the number of defective blocks is small, the performance of the system as a whole is not substantially affected. The blocks are in general connected together by an interconnect network which provides communication links between each block and the outside. Since the blocks are usually small, information processing within each block is relatively fast and the overall system performance is largely determined by the performance (bandwidth and latency) of the network. Since the network may extend over the entire wafer, its total area is significant and it is highly susceptible to defects. Therefore, it is important for the network to be highly tolerant to defects. Traditionally, high communication performance and defect tolerance are conflicting requirements on the network. High communication performance, such as short latency and high bandwidth, requires large numbers of parallel lines in the network which occupy a large area, making it more susceptible to defects.
By limiting the direct connection to be between neighboring blocks only, a serial bus system offers high defect tolerance and simplicity in bus configuration. Systems using a serial bus are described, for instance, in R. W. Horst, xe2x80x9cTask-Flow Architecture,xe2x80x9d IEEE Computer, Vol. 25, No. 4, April 1992, pp. 10-18; McDonald U.S. Pat. No. 4,847,615, and R. C. Aubusson et al, xe2x80x9cWafer-scale Integrationxe2x80x94A Fault-tolerant Procedure,xe2x80x9d IEEE ISCC, Vol. SC-13, No. 3, June 1988, pp. 339-344. These systems have the capability of self configuration and are highly tolerant to defects. However, they inherit the disadvantage of a serial bus and suffer from long access latency because the communication signals have to be relayed from one block to another down the serial bus.
A parallel bus system offers direct connections between all the communicating devices and provides the shortest communication latency. However, a parallel bus system without reconfiguration capability offers the lowest defect tolerance since any defect on the bus can render a substantial part of the system without communication link. Known systems implement parallel bus with limited success. In U.S. Pat. No. 4,038,648 [Chesley] a parallel bus connected to all circuit module is used to transfer address and control information, no defect management is provided for the parallel bus. In U.S. Pat. No. 4,007,452 [Hoff, Jr.], a two-level hierarchical bus is used to transfer multiplexed data and address in a wafer-scale memory. Without redundancy and reconfiguration capability in the bus, harvest rate is relatively low, because defects in the main bus can still cause failure in a substantial part of the system. In both these systems, a separate serial bus is used to set the communication address of each functional module. In each scheme, a defect management different from that used in the parallel bus is required in the serial bus. This complicates the overall defect management of the system as a whole and increases the total interconnect overhead.
Many known systems use a tree-structure in their bus. By reducing the number of blocks the bus signals have to travel through, buses with tree structures offer higher communication speed than those with linear or serial structure.
In K. N. Ganapathy, et al, xe2x80x9cYield Optimization in Large RAMs with Hierarchical Redundancy,xe2x80x9d IEEE JSSC, vol. 26, No. 9, 1991, pp. 1259-1264, a wafer-scale memory using a binary-tree bus is described. The scheme uses separate bus lines for address and data. Address decoding is distributed among the tree nodes in the bus. The separation of address and data buses increases the bus overhead and complicates the defect management.
Accordingly, one object of this invention is to provide a defect or fault tolerant bus for connecting multiple functional modules to one or more bus masters, so that performance of the bus is not substantially affected by defects and faults in the bus nor in the modules.
Another object of this invention is to provide a high-speed interface in the module so that large amounts of data can be transferred between the module and the bus masters.
Another object of this invention is to provide a method for disabling defective modules so that they have little effect, on the rest of the system.
Another object of this invention is to provide a method for changing the communication address of a module when the system is in operation. The technique facilitates dynamic address mapping and provides run-time fault tolerance to the system.
Another object of this invention is to provide programmability in the bus transceivers so that the bus network can be dynamically reconfigured.
In accordance with the present invention, a fault-tolerant, high-speed wafer scale system comprises a plurality of functional modules, a parallel hierarchical bus which is fault-tolerant to defects in an interconnect network, and one or more bus masters. This bus includes a plurality of bus lines segmented into sections and linked together by programmable bus switches and bus transceivers or repeaters in an interconnect network.
In accordance with the present invention a high speed, fault-tolerant bus system is provided for communication between functional module and one or more bus controllers. Structured into a 3-level hierarchy, the bus allows high frequency operation ( greater than 500 MHz) while maintaining low communication latency ( less than 30 ns), and high reconfiguration flexibility. Easy incorporation of redundant functional module and bus masters in the bus allows highly fault-tolerant systems to be built making the bus highly suitable for wafer-scale integrated systems. The bus employs a special source-synchronous block or packet transfer scheme for data communication and asynchronous handshakes for bus control and dynamic configuration. This source synchronous scheme allows modules to communicate at different frequencies and increases the overall yield of the system as it can accommodate both slow and fast memory devices without sacrificing the performance of the fast devices. It also frees the system of the burden of implementing a global clock synchronization which in general consumes a relatively large amount of power and is difficult to achieve high synchronization accuracy in a wafer-scale or large chip environment.
In one embodiment, the functional modules are memory modules and each module consists of DRAM arrays and their associated circuitry. The bus master is the memory controller which carries out memory access requested by other devices such as a CPU, a DMA controller and a graphics controller in a digital system. Such a memory subsystem can be used in for instance, computers, image processing, and digital and high-definition television.
According to the present invention, the memory module and a substantial part of the bus are integrated in a wafer-scale or large chip environment. One variation is to integrate the whole memory subsystem, including the memory modules, the bus and the memory controller, in a single integrated circuit device. Another variation is to integrate the whole memory subsystem into a few integrated circuit devices connected together using substantially the same bus. The invention can also be used in a system where the circuit modules are each a processor with it""s own memory and the bus master is an instruction controller which fetches and decodes program instruction from an external memory. The decoded instruction and data are then sent through the bus to the processors. Such a system can be used to perform high-speed, high through-put data processing.
By grouping the DRAM arrays into logically independent modules of relatively small memory capacity (588 Kbit), a large number of cache lines (128) is obtained at small main memory capacity (4 Mbyte). The large number of cache lines is necessary for maintaining a high cache hit rate ( greater than 90%). The small module size also makes high-speed access ( less than 30 ns) possible.
High defect tolerance in the hierarchical bus is obtained using the following techniques: 1) Use of relatively small block size (512K bit or 588K bit with parity) for the memory modules; 2) Use of programmable identification register to facilitate dynamic address mapping and relatively easy incorporation of global redundancy; 3) Use of a grid structure for the bus to provide global redundancy for the interconnect network; 4) Use of a relatively narrow bus consisting of 13 signal lines to keep the total area occupied by the bus small; 5) Use of segmented bus lines connected by programmable switches and programmable bus transceivers to facilitate easy isolation of bus defects; 6) Use of special circuit for bus transceivers and asynchronous handshakes to facilitate dynamic bus configuration; 7) Use of programmable control register to facilitate run-time bus reconfiguration; 8) Use of spare bus lines to provide local redundancy for the bus; and 9) Use of spare rows and columns in the memory module to provide local redundancy.