Modern day digital signal processors, microprocessors, and network chips process lots of information and store the processed data into memory. Memory typically occupies almost half of the chip area. With decreasing technology nodes, more and more memory cells are closely packed together, thereby increasing the frequency and number of memory faults being detected. Each new technology gives rise to new fault models which require new sets of patterns to test the different kinds of faults in the memories. These new sets of patterns which may be added on top of legacy patterns from older technologies, require more test time, thereby increasing the test cost and the cost of the chip.
A conventional BIST architecture tests the memory at-speed by sending out a “burst” of instructions at a time. The burst may include, for example, four (4) instructions. The number of instructions is chosen to minimize the area and physical design turnaround time. The BIST engine operates using a slow clock, and generates the burst of instructions and then sends the burst of instructions to the memory interface block. The memory interface block then applies the burst of instructions to the memory using a high speed (or fast) clock. While the current instructions are being executed by the memory and the result of any read operation is being compared with the expected data in the memory interface logic, the BIST engine generates the next set of instructions for the next burst.
While performing a write sweep on the full address space of the memory, the BIST writes to a different address location for each instruction of every burst operation. For example, the BIST may write to 4 different address locations in 4 instructions of every burst. But during reading of the memory, data from only 1 out of every 4 of the set of instructions in each burst is read and compared with the expect data. This is because it is desirable to compare the read data in the low speed domain, not the high speed domain.
One reason a high speed comparison is undesirable is because a high speed comparison becomes prohibitive from an area perspective (e.g., the comparator circuitry gets bigger, the logic for assigning redundant elements to make repairs gets more complicated/larger, and at-speed diagnostics requires a cycle counter and two pass testing to stop on the correct cycle). However, to perform the comparison at a low speed, the test circuitry would need to capture the data in the high speed domain and hold it until the burst completes. It would be preferable to capture data from multiple cycles, but this would normally mean multiple capture registers, which is again prohibitive from an area perspective. This leads to the current architectural limitation of only capture data from one read instruction per burst being available.