The present invention relates to semiconductor memory. In particular, the present invention relates to column redundancy architectures of computational random access memories.
Redundancy circuits are essential to boosting production yield of semiconductor devices, especially high density memory devices, since defects are more likely to occur in the high density memory arrays than in the peripheral circuits. Various redundancy schemes have been developed to repair memories having faulty memory cells during testing. Such fault tolerant schemes can involve column redundancy for replacing the column having the defect, with a redundant column of the memory array.
Many redundancy schemes have been proposed in the art for increasing semiconductor yield, which are generally implemented as follows. Once the location of a defective memory cell or cells is identified during testing, the column it is part of is effectively removed from the memory array by ensuring that it can no longer be addressed. A spare column of memory cells physically located elsewhere on the chip is programmed to be accessed by the logical address that would have accessed the defective column. Address programming is typically done through the use of laser-blown fuses and anti-fuses, for example.
Another type of integrated circuit device that requires column redundancy to increase yield are computational random access memories (CRAM). CRAM is a memory device having arrayed parallel processors with hundreds to thousands of processors coupled to a memory array. CRAM is a processor in memory architecture that takes advantage of the large internal memory bandwidth available at the sense amplifiers. By pitch matching each bit serial processing element (PE) to one or more memory columns, CRAM can operate as a massively parallel single instruction stream, multiple data stream (SIMD) computer. CRAM architectures and arrayed parallel processor architectures are well known in the art.
An example of a prior art CRAM is shown in FIG. 1. The CRAM 20 shown in FIG. 1 includes two banks 22 and 24, labeled xe2x80x9cBank 1xe2x80x9d and xe2x80x9cBank 2xe2x80x9d respectively, although a CRAM can contain any number of banks. Bank 22 includes a memory array 26 coupled to peripheral circuits such as row decoders 28, processing elements (PE""s) 30, and column decoders 32. Bank 24 is identically configured to bank 22, and includes a memory array 34 coupled to peripheral circuits such as row decoders 36, PE""s 38, and column decoders 40. Memory arrays 26 and 34 can be of any type of memory, such as dynamic random access memory (DRAM) or static random access memory (SRAM), for example, with row decoders 28 and column decoders 32 selecting particular memory cells for read and write operations. Each PE 30 has direct access to a single column of memory for use as its local memory, and is coupled to a common broadcast bus 42. As shown in FIG. 1, PE""s 30 and 38 are all coupled to the same broadcast bus 42, which can further extend to other banks of the chip. The PE""s 30 are connected to the common broadcast bus 42 in a wired AND configuration, allowing the common broadcast bus 42 to function as a dynamic zero detect circuit. Furthermore, if at least one PE 30 writes a zero to the common broadcast bus 42, all other PE""s 30 receive the zero value for register write back.
An example of a prior art PE 30 or 38 used in CRAM 20 of FIG. 1 is shown in FIG. 2. A pair of adjacent PE""s 30 are shown in FIG. 2, illustrating the interconnections between each other and the broadcast bus 42. The presently shown PE""s 30 support bit-serial computation, left-right nearest-neighbor communication, wired-AND broadcast bus evaluation and external databus access. PE 30 includes a single-bit memory register 50, a single-bit write enable register 52, a single-bit shift left register 54 and a single-bit shift right register 56, an arithmetic logic unit (ALU) 58, and a transceiver 60. Memory register 50 can include well known bitline sense amplifiers, for example. In addition to providing shifting functionality, registers 54 and 56 can be used as temporary storage locations for the results provided by the ALU 58. Each of the four single-bit registers is implemented as a six transistor dual rail RAM cell in the present embodiments, but can be implemented with any equivalent register circuit. The registers can include additional gating circuits that receive control signals for controlling the input and output of data, which are not shown in FIG. 4 to simplify the schematic, but understood to be required for proper operation by those of skill in the art. For example, shift left register 54 can receive a right shift control signal for storing data from the PE to the right. In this particular example, the registers store data as complementary logic levels, and complementary signal lines carry the data between the PE components. It should be obvious to those of skill in the art that single-ended logic levels and signal lines can be used in alternate embodiments.
Memory register 50 stores data received from, or to be written to, a memory cell of its associated memory column, and write enable register 52 includes combinational logic to determine whether a PE 30 should write its result to its local memory via memory register 50. Shift left register 54 receives result data from the PE 30 to its right, while shift right register 56 receives result data from the PE 30 to its left. ALU 58 receives an 8-bit opcode, a single bit of data from memory register 50, shift left register 54 and shift right register 56, and provides a result from its output. ALU 58 consists of a multiplexer that can implement any boolean function of the three single bit inputs it receives. The result output of ALU 58 is provided to each register of PE 30, and to transceiver 60 for communicating data between the PE 30 and the broadcast bus 42. The bus transceiver 60 is implemented with static CMOS NOR gates that connect to NMOS pull down transistors. The memory can also be accessed externally through a conventional databus 62.
Because memory array 26 of FIG. 1 is no different than the memory array of commodity semiconductor memory devices, a column redundancy scheme to correct memory defects is necessary to maximize manufacturing yield. Unfortunately for CRAM devices, replacing the defective column of memory cells with a spare column of memory cells inherently requires replacement the PE 30 coupled to the defective column. This is significant because the communication lines between adjacent PE""s must remain uninterrupted for proper operation. As can be seen in FIG. 2, the PE""s adjacent to the PE to be removed (disabled) will need to communicate with each other in order to maintain proper operation of the CRAM.
It should be noted that manufacturing defects can occur in the PE 30 itself, eventhough its associated memory column has no defective cells. Correspondingly, a defective PE 30 requires replacement of the PE 30 and its associated memory column.
Several column redundancy schemes that can be applied to CRAM are known in the art. For example, address remapping circuits can be used to preserve the sequential address space of the memory columns, with defective columns being xe2x80x9cbypassedxe2x80x9d when addressed. These remapping circuits tend to have high latency that negatively impact the performance of the CRAM.
Another column redundancy scheme has nearest-neighbor interconnect that can be utilized for PE fault tolerance, but is not useful for situations in which arbitrary numbers of series-adjacent PE""s are defective.
Redundancy can be provided with bypass switches that are used as conductors which are shorted or blown by a laser. Unfortunately, these bypass switches consume large circuit area and cannot be incorporated into the pitch limited PE area.
It is, therefore, desirable to provide a CRAM column redundancy scheme in which non-adjacent PE""s can communicate with each other after any number of PE""s have been effectively removed from the memory array. It is further desirable to provide a CRAM column redundancy scheme with high speed address remapping, and PE redundancy circuits that fit within the pitch limited PE area.
It is an object of the present invention to obviate or mitigate at least one disadvantage of previous bi-directional bus line amplifier circuits and methods. In particular, it is an object of the invention to provide a high-speed bi-directional bus line architecture.
In a first aspect, the present invention provides a redundancy enabled processing element. The redundancy enabled processing element including a logic circuit and a bypass circuit. The logic circuit receives data from adjacent processing elements and generates result data corresponding to the function of the logic circuit. The bypass circuit receives the data from the adjacent processing elements and the result data. The bypass circuit is settable for passing the result data to one of the adjacent processing elements in a normal mode of operation and settable for passing the data from the one of the adjacent processing elements to the other of the adjacent processing elements in a bypass mode of operation.
In an embodiment of the present aspect, the bypass circuit includes a skip register for providing a skip register output for setting the bypass circuit to the bypass mode of operation. The skip register can include a pair of cross-coupled inverters, where the output of one of the cross-coupled inverters provides the skip register output, and a pair of access transistors for coupling the cross-coupled inverters to complementary skip data.
In other aspects of the present embodiment, the redundancy enabled processing element further includes a memory register for providing the complementary skip data to the skip register, and a transceiver circuit for transferring the result data between a broadcast bus and the logic circuit, where the transceiver circuit is disabled in response to the skip register output.
In yet another aspect of the present embodiment, the bypass circuit includes a first multiplexer for receiving the data from one of the adjacent processing elements and the result data, and a second multiplexer for receiving the data from the other of the adjacent processing elements and the result data. The first and second multiplexers pass one of the data and the result data in response to the skip register output.
In a second aspect, the present invention provides a method of disabling a redundancy enabled processing element. The method includes the steps of loading a skip register of a bypass circuit with skip data, disabling a transceiver circuit in response to the skip data stored in the skip register, and coupling data communication lines of a first adjacent processing element to a second adjacent processing element in response to the skip data stored in the skip register.
In an embodiment of the present aspect, the step of loading includes driving a databus with the skip data, loading a memory register with the skip data from the databus, and activating access transistors of the skip register for storing the skip data provided by the memory register.
In other alternate embodiments of the present aspect, the step of coupling includes switching first and second multiplexers to a bypass state and activating a spare processing element to replace the disabled redundancy enabled processing element. According to an aspect of the present alternate embodiment, the step of activating includes storing an address location of the disabled redundancy enabled processing element, and remapping a logical address for generating a physical address offset by the stored address location.
In another aspect of the present alternate embodiment, the step of remapping can include comparing the logical address with the stored address location, generating an offset value if the logical address is greater than the stored address location, and adding the offset value to the logical address for generating the physical address.
In an alternate embodiment of the present aspect, the step of remapping can include comparing the logical address with the stored address location and generating a selection signal corresponding thereto, generating pre-computed physical addresses in parallel with the step of comparing, and selecting one of the pre-computed physical address as the physical address in response to the selection signal.
In a third aspect, the present invention provides a computational random access memory having a plurality of memory columns. The computational random access memory includes row decoders, processing elements and spare processing elements. The row decoders access memory cells in each memory column. The processing elements are coupled to the memory columns, where each processing element includes a logic circuit for receiving data from adjacent processing elements and for generating result data corresponding to the function of the logic circuit, and a bypass circuit for receiving the data from the adjacent processing elements and the result data. The bypass circuit is settable for passing the result data to one of the adjacent processing elements in a normal mode of operation and settable for passing the data from the one of the adjacent processing elements to the other of the adjacent processing elements in a bypass mode of operation. The spare memory columns and associated spare processing elements located in the memory for replacing faulty memory columns and associated disabled processing elements.
In embodiments of the present aspect, each processing element is coupled to a single memory column or to more than one memory column.
In another embodiment of the present aspect, the bypass circuit includes a skip register for providing a skip register output for setting the bypass circuit to the bypass mode of operation. The skip register can include a pair of cross-coupled inverters, where the output of one of the cross-coupled inverters providing the skip register output, and a pair of access transistors for coupling the cross-coupled inverters to complementary skip data. The bypass circuit can further include a first multiplexer for receiving the data from one of the adjacent processing elements and the result data, and a second multiplexer for receiving the data from the other of the adjacent processing elements and the result data, such that the first and second multiplexers pass one of the data and the result data in response to the skip register output.
In yet another embodiment of the present aspect, the processing element further includes a memory register for providing the complementary skip data to the skip register, and a transceiver circuit for transferring the result data between a broadcast bus and the logic circuit. The transceiver circuit can be disabled in response to the skip register output.
In another embodiment of the present aspect, the computational random access memory further includes an address remapping circuit for generating a physical address offset by addresses of the faulty memory columns and associated disabled processing elements. The address remapping circuit can include a comparator, a priority encoder and an adder. The comparator compares a logical address to a faulty address location corresponding to the disabled processing element. The priority encoder generates an offset value if the logical address is greater than the faulty address location. The adder generates the physical address corresponding to the sum of the logical address and the offset value.