IEEE Standard 1149.1 and 1a entitled IEEE Standard Test Access Port and Boundary-Scan Architecture, published Oct. 21, 1993 by the IEEE under ISBN 1-55937-350-4 relates to circuitry that may be built into an IC device to assist in testing the device as well as testing the printed circuit board on which the device is placed. In particular, the standard provides for testing IC devices connected on a standard control bus in series (commonly referred to as a daisy chain).
FIG. 1 shows a structure comprising three devices, controlled by four signals, a test data input signal TDI applied to DEVICE 1, a test data output signal TDO applied by DEVICE 1 to DEVICE 2 and chained through DEVICE 3, a test mode select signal TMS, and a test clock signal TCK. This structure complies with IEEE Standard 1149.1. A data output port TDO from one device is connected to the data input port TDI of the next device to create the daisy chain. All data and instructions for all devices are loaded into the data input port of the first device in the chain.
The test mode select signal TMS and the clock signal TCK control a 16-state state machine shown in FIG. 2 that is within the IC device, which meets IEEE Standard 1149.1, and controls shifting in of the data. On each rising edge of clock signal CLK, the state of test mode select signal TMS is inspected by a state machine within the IC device. (Such state machines are well known and are not discussed here.) FIG. 2 shows movement through the states based on the TMS signal at the rising edge of CLK. As shown in FIG. 2, five consecutive high (logic 1) TMS signals place the state machine into STATE 1, the Test-Logic Reset state. From there, a single low signal or a continuous low signal places the state machine into STATE 2, the Run-Test Idle state in which no action occurs but from which action can be initiated more quickly.
Loading data into the data registers of the devices will now be discussed. From STATE 2 (FIG. 2), a single logic 1 moves the state machine to STATE 3, the Select-DR-Scan state, which is a path select state from which loading of data registers can be initiated. One logic 0 signal initiates STATE 4, from which initializing data are loaded in parallel from an internal register. Next, a logic 0 signal initiates STATE 5, the Shift-DR state, which is held by logic 1 TMS signals while serial data are shifted into a shift register or registers. After serial shifting of data, a logic 1 followed by logic 0 causes a pause at STATE 7. Another 10 returns to STATE 5 for more loading of serial data. Following STATE 5 or STATE 7, two logic 1's initiate STATE 9 in which the appropriate data registers are actually updated. While the state machine is in STATE 9, data that have been shifted into the IC are latched into the data registers on the falling edge of TCK. From here, continuous high signals return the state machine to STATE 1, the Test-Logic Reset state, and continuous low signals return to STATE 2, the Run-Test Idle state.
Loading instruction data into the instruction registers of the devices will now be discussed. From STATE 2, two logic 1 signals prepare for capturing instructions into the instruction register by moving the state machine to STATE 10, the Select-IR-Scan state. A logic 0 then initiates STATE 11, the Capture-IR state, and a logic 0 then initiates STATE 12 in which instruction data are shifted into the instruction register while the TMS signal remains at logic 0. State 14 allows for a pause in the shifting of instructions into the instruction register, and STATE 16 causes the actual latching of the instructions into the instruction register, on the falling edge of TCK. Once the new instruction has been latched, it becomes the current instruction.
Programming, erasing, or reading back data from the devices will now be discussed. Some CPLD devices are programmed by a nonvolatile means such as EPROM cells or flash cells (transistors). Generally, these devices can be programmed using the IEEE standard discussed above. The programming step involves raising voltages at certain transistor gates to a high level and maintaining the high level until sufficient charge has flowed onto or away from a floating gate of the transistor to cause the transistor to maintain a certain state when the high voltage is removed. Typically, a stream of data from ten to several hundred bits long can be shifted into several devices in less time than is required to program a transistor (cell) in a device. Thus a practical and widely used programming procedure is to serially shift an instruction and then a unit of programming data through a daisy chain of devices (STATEs 5 and 12 of FIG. 2) and then move into a programming mode (usually occurs in STATE 2 of FIG. 2 when entered from STATE 9 or STATE 16) during which all addressed EPROM, EEPROM, or flash transistors (cells) are programmed simultaneously as specified by the programming data. This method is practical and efficient when all devices in the daisy chain are the same size and have the same requirements for programming time and programming voltage. However, the devices are often unequal in size.
One prior art method for programming a daisy chain of devices having unequal size is disclosed in U.S. Pat. No. 5,635,855 to Tang. Tang discloses a method for simultaneously programming a plurality of in-system programmable devices connected in series. If three devices are to be programmed and the three are of unequal size, Tang teaches a method by which all three devices are programmed simultaneously until the first is done, and the remainder continue until they are also done (see Tang FIG. 9). Such a method can be used to significantly reduce the programming, erase and readback times as compared to accessing each device in sequence, especially for a large number of devices. Tang's method is satisfactory when all such devices have programmable cells which are accessed (programmed, erased, or read back) in about the same amount of access time and which are substantially free of programming omissions or otherwise do not require retries. However, Tang's method is not compatible with IEEE Standard 1149.1 and also is not the optimum method when the devices have unequal access times (wait periods). The wait period is the time that it normally takes a programmable device to respond to programming data by altering its cell states (for programming and erase operations) or indicating its cell states (for a read back operation) and then generating an output signal indicating completion of that process. Since the wait period is typically much longer than the time required to input programming data, the wait period for a device is the principal factor in the overall time required to program a device.
Typically, devices having larger numbers of programmable cells can generate programming voltages more quickly and therefore have shorter wait periods for programming a cell or set of cells than devices having smaller numbers of programmable cells because of the internal cell overhead. Thus, a large device that is, say, eight times as large as a smaller device will not take eight times as long to program.
If the programming of all devices is done based upon the longest wait period, the time needed to program all of the devices is made longer than necessary. However, if a shorter wait period is used, programming of devices with the longer wait times will not be performed properly. Thus there is a need to provide an improvement that accommodates serially connected devices having different wait periods and cell numbers while simultaneously reducing the overall time of programming the devices.
Another prior art method that addresses the problem mentioned above is disclosed in U.S. Pat. No. 5,999,014 to Jacobson et al (Jacobson '014). Jacobson '014 discloses a method for concurrently accessing in-system PLDs for program, erasure or readback, and accommodates retries to assure completion of programming even when the initial attempt is not entirely successful. According to the method disclosed in Jacobson '014, where there are devices having different numbers of programmable memory cells, and whose memory cells require different wait periods to carry out programming, the method provides for programming only the devices requiring programming at the rate required by the slowest of the devices requiring programming. For example, referring again to FIG. 1, assume DEVICE 1 includes 500 addresses (#A=500), each address having a programming time TP=200 ms, where TP is the time required to program one address location. DEVICE 1 also includes a four-bit data register 11 (RL=4) that stores shifted-in data for programming into a group of four bits associated with a selected address A0-A499 of DEVICE 1 includes four bits that are written from a four-bit data register 11. Further, DEVICE 1 includes an instruction register 12 for storing instructions shifted in the boundary scan chain. Assume also that DEVICE 2 has 1000 addresses, a TP=100 ms and an eight-bit data register 21, and that DEVICE 3 has 2000 macrocells, a TP=50 ms, and a sixteen-bit data register 31. DEVICES 2 and 3 have instruction registers similar to instruction register 12. The number of addresses defines the logic capacity of the PLD.
In accordance with the method disclosed by Jacobson '014, since DEVICE 1 is the slowest of the three and requires 200 ms to program, programming initially occurs for 200 ms. That is, configuration data is shifted into the data registers 11, 21, and 31 of each of the three devices, and then programming is performed for 200 ms. This data shifting and programming is repeated until programming of the slowest device (i.e., DEVICE 1) is completed. When the programming of the slowest device is completed, the programming rate is increased to the next slowest device that still has addresses to program (i.e., DEVICE 2), and maintained at this programming rate until programming of the next slowest device is completed. Finally, when all slower devices have been programmed, the programming rate increases to that of the fastest device (i.e., DEVICE 3), which is typically the device having the largest number of programmable cells, until programming is completed.
Although the method disclosed by Jacobson '014 generally provides for better throughput, it is not true that arbitrary application of this methodology results in optimal throughput. For instance, concurrently accessing devices with very long programming burn times along with devices having very short programming burn times may not be efficient. In addition, if the time it takes to shift in the data is very close to the programming burn time, there may be little benefit to using the concurrent approach disclosed by Jacobson '014. In other words, when it comes to concurrent programming in a heterogeneous device environment, one size does not fit all.
What is needed is an improved method of concurrently accessing in-system PLDS for program, erasure or readback that optimizes programming times by taking into account the programming burn times and data shift times.