Although modern communication protocols enable the transmission of billions of bits per second, conventional backplane switching systems and related components do not have comparable clock rates. For example, the System Packet Interface 4 (SPI4) protocol requires a minimum throughput rate of 10 gigabits per second over a native bus width of 16 bits using Double Data Rate (DDR) techniques. At a throughput rate of 10 gigabits, such a bus is thus sampled at a 625 MHz rate. Because of the DDR sampling (sampling at both the rising and falling edge of the clock), the bus is clocked at 312.5 MHz. However, many ASICs and FPGAs cannot achieve even a 312.5 MHz clocking rate. Thus, external SPI4 buses routed to such devices must be demultiplexed according to a slower single edge clock rate that is a fraction of the external 625 MHZ sampling rate. For example, an FPGA having a single edge clock rate that is ¼th the sampling rate of the external SPI4 bus receives four 16-bit words (typically denoted as tokens) from the SPI4 bus per FPGA clock cycle. The four tokens are then routed within the FPGA on a four-token wide bus that is clocked at the lower clock rate.
In a wire-line-level protocol such as SPI4, these tokens must be parsed one at a time according to their order of arrival to ensure that they comply with the SPI4 protocol. Typically, a finite state machine implemented within the FPGA is used to parse the tokens. For example, this parsing may be expressed in the form of IF THEN statements such as “If in state A and input B occurs, Then transition to state C.” As is known in the arts, a user must configure an FPGA using one of a variety of available software tools before it can implement the desired function. For example, with respect to the just-described finite state machine, the necessary IF THEN statements may be written in a suitable language for these tools such as RTL. The RTL code is then converted by the programming tool to Boolean logic that may be implemented using primitive logic gates (e.g., a 4-input AND gate). The programming tool programs a configuration memory within the FPGA so as to instantiate programmable blocks within the device to implement these primitive logic gates.
However, this conventional FPGA programming process often proves to be problematic. The parsing of tokens with the FPGA (such as the four in the above example) must occur within one FPGA clock cycle because of the difference between the FPGA clock rate and the external SPI4 bus sampling rate. Current state and next state information required at each token level must then appear as combinatorial nodes between each token produced by identical state flow processors. This structure results in deep combinatorial logic that is at least N levels deep where N corresponds to the number of tokens processed. During a clock cycle, the state variables resulting from the last-processed token are sampled and appear as inputs to the next group of tokens received in the next clock cycle. Processing of a token cannot begin, however, until the disposition of the immediately-preceding token is known since it is a required input for the processing of the token.
A conventional finite state machine 10 for processing multiple tokens during a single clock cycle is shown in FIG. 1a. In this example, the state machine's clock 15 cycles at a rate ¼th that of an external wire-line-level bus (not illustrated) such as a SPI4 bus. After demultiplexing, finite state machine 10 must thus process four tokens registered in an input register 20 at every cycle of clock 15. Because of the wire-line-level protocol, the tokens must be parsed by state machine 10 in their arrival order. To indicate this arrival order, the tokens are denoted as token_1 through a token_4. Each token is processed in view of the current state machine state (corresponding to the preceding token) and the current input conditions (derived from the current token) to generate a “next state” state value for the subsequent token. With respect to the subsequent token, this “next state” becomes the current state, and so on. Combinatorial nodes L1 through L4, each implemented in primitive logic gates as described previously, perform the processing for corresponding tokens token_1 through token_4. For example, combinatorial node L2 processes token_2 using a current state 30 from combinatorial node L1 and current token inputs or conditions 35 derived from token_2. Similarly, combinatorial node L3 processes token_3 using the current state 40 from combinatorial node L2 and input conditions 45 from token_3. Combinatorial node L4 processes token_4 using a current state 50 from combinatorial node L3 and input conditions 55 from token_4. Combinatorial node L1 processes token_1 using a current state 60 from combinatorial node L4 and input conditions 65 from token_1. Because current state 60 is generated in the preceding clock cycle with respect to the processing performed by combinatorial node L1, a state register 70 is necessary to store current state 60 so it may be used by combinatorial node L1 in the subsequent clock cycle.
Although implementing such a finite state machine has been manageable for ASIC technologies, such an implementation has proven to be extremely challenging for FPGA technologies. With current software development tools, difficulties arise as early as the synthesis phase in that the absolute minimum number of logic levels (N) is not always realizable nor easily controllable from one synthesis run to another. Further difficulties arise in the back-end mapping phase in which access to limited wide function logic resources occurs. The placement phase, even when floor-planning is used, does not produce ideal placement or reproducible results. This may be seen in FIG. 1b, where a programmable logic device 100 having a plurality of logic blocks 105 is instantiated to perform the combinatorial logic described with respect to FIG. 1a. Each combinatorial node L1 through L4 will be spread across multiple logic blocks in the non-ideal fashion just described. The same can be said for the routing of combinatorial outputs from one node to the next. The resulting design produces a performance level that is lower than optimal and subject to wide fluctuations from build-to-build as a design evolves. Moreover, because of the required constant wire-line speeds, pipe-lining or parallel processing techniques cannot be used to avoid the problem of multiple logic levels.
Accordingly, for this and other reasons, there is a need in the art for an improved finite state machine design that can sequence through multiple states during a single PLD clock cycle.