In the implementation of digital design systems or application specific integrated circuits (ASIC), one important criteria is timing closure associated with the various interconnected design components. High performance designs benefit from increased implementation efficiency to obtain higher frequency targets in a resulting circuit. High frequency operation implies the use of fewer gates between flip-flops and an increasing use of pipelining techniques. Pipeline techniques permit higher operating frequencies by breaking up paths into one or more states separated by flip-flops. The use of pipelining techniques typically increases the number of sequential cells used in a design. Accordingly, the realization of high frequency designs focuses on providing more efficient sequential cells that allow rapid timing closure during chip implementation.
A traditional multiplexed scan flip-flop design has a number of characteristic features associated with the system level implementation of the design. The efficiency of sequential cells is traditionally measured through observation of parameters such as data set up time, clock to “Q” delay, data hold time and cell area. The operating frequency of a chip is typically limited by intrinsic delays, setup times and tolerance variations in clock duty cycles. One measure to determine the limitations on the chip operating frequency is to observe the minimum operating clock as a sum of delays, setup times and tolerance variations. The following equation provides one measure of determining a minimum operating clock.TCLK_MIN=ΔTCLK—Q+ΔTg+Δtsetup+CLK_SKEW+CLK_JITTER  (1)
Where:                ΔTCLK—Q=flip-flop clock-to-Q output delay        ΔTg=path gate delays plus RC        ΔTsetup=flip-flop data to clock setup time        CLK_SKEW=the variation in clock tree insertion delay and OCV induced clock insertion delay differences        CLK_JITTER=duty cycle variations (cycle to cycle).        
Traditional sequential cell design focuses on optimization of cell area, ΔTCLK—Q and ΔTsetup. However, at a system level, optimization of the clock period focuses on the minimization of each term in the minimum clock period of equation (1).
Referring to FIG. 1, a conventional flip-flop 100 is shown. The flip-flop 100 generally comprises a multiplexer 102, transparent latches 104, and inverters 124 and 126. The latches 104 are generally comprised of inverters 106, 108, 110, 112, 114, and 116 and transmission gates 118 and 120. Multiplexer 102 is provided at an input to the latches 104 to permit selection between functional data D and scan data SD. The presence of multiplexer 102 has an impact on the ΔTg term, which can be viewed from different perspectives with respect to path optimization. First, multiplexer 102 may be viewed as increasing the data path delay of the functional mode data. Second, multiplexer 102 may be viewed as consuming one gate delay of paths gate delay budget. Third, multiplexer 102 may be viewed as increasing the setup time for the flip-flop. In either case, optimization of the minimum clock period is influenced by the presence of multiplexer 102. In addition, multiplexer 102 increases power consumed in functional mode when both the functional and scan data input toggle when the flip-flop 100 changes logic states.
Additionally, turning to FIG. 2, a circuit 200 having several scan logic paths. Each path generally comprises a flip-flop 100-1 to 100-3 and scan path delay buffers 204-1 to 204-3 (respectively). As can be seen in the third path, though, a scan signal regeneration buffer 202 and a scan path delay buffers 202 are connected to a scan multiplexer of a downstream flip-flop 100-4. The path is impacted by the scan logic as illustrated by ΔTg delay 206. When upstream flip-flop 100-3 changes state, the scan path logic switches in relation to downstream flip-flop 100-4 thereby consuming additional power.
Another difficulty is observed in the potential race conditions when the same the clock edge is used to both launch and capture data. When the ΔTCLK—Q and ΔTsetup delay values are minimized to increase operating frequency, there is an increased probability that two back-to-back flops can experience data race-through problems. Race-through occurs whenever an upstream flip-flop launches data before the downstream flip-flop stops capturing data. This effect is observed as a data hold time violation on the data input of the downstream input of the flip-flop. In addition, clock skew between launching and capturing flops often creates and/or exacerbates hold time violations. Furthermore, scan chain reordering is often conducted to reduce routing congestion in the chip. However, scan chain reordering has the potential to create a large set of scan mode hold violations, since scan data would then be routed to the closest possible flip-flop. The shorter routes lead to less propagation delay which in turn lead to higher likelihood of scan mode hold violations.
While hold violations can occur in any functional operating mode, the greatest number of violations typically occur during scan shift/capture modes. The number of hold violations induced by scan chain reordering can be potentially enormous. Hold violations are usually fixed with the insertion of delay buffers in the path containing the hold violation. The delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) are cells that are specially designed to have a greater than normal intrinsic cell delay. Accordingly, the delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) are typically inserted through design software, or automatically, in the hold violation path prior to the flip-flop that has the hold violation. Typically, the insertion of the delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) occurs immediately before the hold violation flip-flop. When the delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) are inserted before the receiving flip-flop, all the timing arcs that terminate in the data input port of the flip-flop with the hold violation are delayed. Accordingly, the introduction of delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) to overcome hold time violations has an impact on the entire system.
The drawbacks that typically occur with the insertion of delay buffers (such as delay buffers 204-1 to 204-3 of FIG. 2) include additional usage of chip area, an increase in routing congestion, diminished signal integrity through increased cross talk and increased power consumption. Any of these drawbacks also can trigger additional implementation or timing closure iterations that add to the implementation costs of the circuit. The addition of a large number of hold buffers results in an increase in chip area.
Scan logic also consumes a certain amount of power during normal functional mode operation, and circuit designs typically take into account routing and connectivity of scan logic paths and their impact on paths and operation of the normal functional mode logic. Conventionally, flip-flops with a multiplexed scan inputs often reuse the primary flip-flop data output such as “Q” and/or “QB” to propagate the scan data input to the next flip-flop in the scan logic chain. The reused outputs often result in metal routing and buffers in the scan data path that create parasitic loads on the paths that can impact path delays in the circuit. One technique to overcome parasitic loading, calls for a dedicated scan output in the flip-flop architecture. Such a technique is implementation sensitive and a ΔTCLK—Q delay may be difficult to avoid in the path of the flip-flop. Scan logic implemented in a conventional flip-flop also typically has metal routing that is associated with the slave stage of the flip-flop to take advantage of some of the architectural features of the flip-flop. Accordingly, conventional scan logic draws power in the flip-flop operation during normal operating mode because the scan logic path toggles when the slave stage of the flip-flop changes state. Power consumption in the scan logic is also observed with a series of flip-flops where the scan logic path switches with every change in an upstream flip-flop state.