Although modern communication protocols enable the transmission of billions of bits per second, conventional backplane switching systems and related components do not have comparable clock rates. For example, the System Packet Interface 4 (SPI4) Phase 2 (SPI4-2) protocol requires a minimum throughput rate of 10 gigabits per second over a SPI4-2 native bus having a width of 16 bits using Double Data Rate (DDR) techniques. At a throughput rate of 10 gigabits, such a bus is thus sampled at a 625 MHz rate. Because of the DDR sampling (sampling at both the rising and falling edge of the clock), the bus is clocked at 312.5 MHz. However, many application specific integrated circuits (ASICs) and field programmable gate arrays (FPGAs) cannot achieve even a 312.5 MHz clocking rate. Thus, external SPI4-2 buses routed to such devices must be demultiplexed according to a slower single edge clock rate that is a fraction of the external 625 MHZ sampling rate for the native SPI4-2 bus. For example, an FPGA having a single edge clock rate that is ¼th the sampling rate of the native SPI4-2 bus receives four 16-bit words (typically denoted as tokens) per FPGA clock cycle. The four tokens are then routed within the FPGA on a four-token wide bus that is clocked at the lower clock rate. In general, the native SPI4-2 bus is demultiplexed according to an FPGA clock that is 1/nth the rate of the bus clock, where n is a positive integer. As just discussed, using a value of n=4 is typical although that may be increased to, for example, a value of n=8 if the FPGA clock rate is relatively slow. At each cycle of the FPGA clock, n words or tokens are demultiplexed from the SPI4-2 native bus.
This demultiplexing of the native SPI4-2 bus causes a number of complications when implementing a SPI4-2 interface using a PLD such as an FPGA. For example, the SPI4-2 standard uses a diagonal interleaved parity (DIP) scheme for point-to-point error detection. In a SPI4-2 interface, a SPI4-2 packet such as packet 100 shown in FIG. 1 includes a variable number of sixteen-bit data words 105 that are followed by a single sixteen-bit control word 110. Packet 100 (which may also be denoted as a SPI4-2 burst) thus does not include a control word 115 from a previously-transmitted packet. As illustrated in FIG. 1, packet 100 includes eight data words 105 but it will be appreciated that the number of data words in a given SPI4-2 packet will vary depending upon the application. However, regardless of the number of data words 105 included in a SPI4-2 packet, the packet's end is demarcated by control word 110.
Having received the control word 110 for packet 100, a sixteen bit parity word 120 may be calculated using a diagonal-interleaved parity (DIP) scheme. Each bit of parity word 120 corresponds to a diagonal XOR calculation chain starting at the first data word 105 in packet 100. For example, a diagonal exclusive OR (XOR) calculation chain 121 starts from the most significant bit (bit position 15) of the first data word 105 and propagates through the remaining data words 105 and control word 110 to produce the value for bit position 7 of parity word 120. Calculation chain 121 begins with the XOR of the most significant bit of the first data word 105 and the next-most-significant bit (bit position 14) of the second data word 105. As can be seen from FIG. 1, bit position 15 of the first data word 105 holds a logical one whereas bit position 14 of the second data word 105 holds a logical zero. The XOR product is thus a logical 1. This XOR product propagates through calculation chain 121 by being XORed with the bit stored in bit position 13 of the third data word 105, the resulting XOR product then XORed with the bit stored in bit position 12 of the fourth data word 105, and so on, until the final XOR product is XORed with the bit stored in bit position 7 of control word 110 to produce a value for bit position 7 of parity word 120. It may be seen that the XOR product of the resulting bit sequence {1,0,0,1,0,0,0,0,1} in calculation chain 121 produces a value of logical one for bit position 7 of parity word 120.
The remaining XOR calculation chains are processed analogously. For example, XOR calculation chain 122 starts at bit position 14 of the first data word 105 and propagates through the remaining data words 105 and control word 110. In chain 122, the starting bit is XORed with the bit stored in bit position 13 of the second data word 105. The resulting XOR product is XORed with the bit stored in bit position 12 of the third data word 105, and so on, until the value for bit position 6 of parity word 120 is produced. Note that the least four significant bits of control word 110 are replaced with logical ones during the calculation of the least four significant bits for parity word 120.
There will always be XOR calculation chains that must wrap around in a circular modulo-16-bit fashion. For example, XOR calculation chain 123 starts at bit position 2 of the first data word 105 before propagating through the remaining data words 105 and control word 110. By the third data word 105, chain 123 is at the least significant bit (bit position 0). Thus chain 123 must wrap around to the most significant bit (bit position 15) as it propagates through the fourth data word 105.
After sixteen-bit parity word 120 has been calculated, its most significant byte is XORed with the least significant byte to produce 8-bit parity word 130. In turn, parity word 130 is folded and the two halves XORed to produce a DIP4 parity word 135. In this fashion, sixteen-bit parity word 120 is collapsed to produce DIP4 parity word 135. In a receive function, DIP4 parity word 135 is compared to the original value stored in the least four significant bits of control word 110 (which had been treated as being all logical ones for the DIP calculation) to determine if the data words 105 and control word 110 were received correctly. Conversely, in a transmit function, DIP4 parity word 135 would replace these four bits in control word 110.
The calculation of DIP4 parity word 135 becomes problematic when performed by a programmable logic device such as an FPGA as a result of the demultiplexing of the native SPI4-2 bus. Because of the demultiplexing, the position of the control word cannot be readily determined, requiring in prior approaches that a number of sets of calculation chains be calculated.
As discussed above, to implement a SPI4-2 interface in an FPGA, there will be n 16-bit words from packet 100 received for every FPGA clock cycle. Should the received packet contain more than n words, the XOR calculation chains cannot be finished in just one FPGA clock cycle. For example, assume that n equals four as discussed previously and that the packet corresponds to packet 100 of FIG. 1. At each FPGA clock cycle, four words from packet 100 will be received into a register 200 as shown in FIG. 2. The four words stored within register 200 may be designated word 3 through word 0 according to their sequence within packet 100. For example, if this FPGA clock cycle is such that the beginning of packet 100 is captured, then word 3 corresponds to the first data word 105, word 2 corresponds to the second data word 105, word 1 corresponds to the third data word 105, and word 0 corresponds to the fourth data word 105. Given just these four words, it is clear that the XOR calculation chains such as chains 121, 122, and 123 of FIG. 1 cannot be completed during this FPGA clock cycle.
Instead, diagonal XOR calculation chains 210 will be propagated through words 3, 2, 1, and 0 and the results-stored such as in an inter-slice parity summing register 205. For example, an diagonal XOR calculation chain 210a begins at the most significant bit of word 3 and continues through bit position 14 of word 2 and bit position 13 of word 1 to include bit position 12 of word 0. This resulting value is then stored in bit position 12 of inter-slice parity summing register 205. Similarly, another diagonal XOR calculation chain 210b begins at bit position 14 of word 3 and continues through bit positions 13 of word 2 and bit position 12 of word 1 to include bit position 11 of word 0. This resulting value is then stored in bit position 11 of inter-slice parity summing register 205. At the next FPGA clock cycle, the values stored in inter-slice parity summing register 205 will load into the diagonal XOR calculation chains 210. But note that it will not be known where control word 110 will be placed within register 200. For example, with respect to packet 100, register 200 would contain the first four data words 105 in the initial FPGA clock cycle. At the second FPGA clock cycle, register 200 would contain the next four data words. Finally, at the third FPGA clock cycle register 200 would store control word 110. Because there were eight data words 105 preceding control word 110 in packet 100, control word 110 would be received as word 3 in register 200. However, if register 200 was processing a packet having nine data words 105, then control word 110 would be received as word 2 in register 200. It thus follows that control word 110 may be received as any one of words 3 through word 0 in register 200, depending upon the size of the packet being processed.
Because it cannot be predicted where control word 110 will end up in register 200, it cannot be predicted where a diagonal XOR calculation chain will end when register 200 contains control word 110. For example, diagonal XOR calculation chain 210 could end at any one of four extraction points 220a, 220b, 220c, and 220d, depending upon where control word 110 was received. If control word 110 is received as word 3, diagonal XOR calculation chain 210 would end at extraction point 220a. Alternatively, if control word 110 is received as word 2, diagonal XOR calculation chain 210 would end at extraction point 220b. As yet another alternative, if control word 110 is received as word 1, diagonal calculation chain 210 would end at extraction point 220c. Finally, if control word 110 is received as word 0, diagonal XOR calculation chain 210 would end at extraction point 220d. In this fashion, the number of XOR calculation chains is increased by n times because each extraction point must be considered. For example, with respect to a value of n=4 such as used in register 200, there would thus be four sets of diagonal XOR calculation chains, each set having 16 chains corresponding to the sixteen bits for each word in packet 100. This is very inefficient because only one set will provide the DIP4 parity word 135 as determined by which position control word 110 ends up in register 200. The 16-bit value from this set of XOR calculation chains forms parity word 120, which is then collapsed to form DIP4 parity word 135 as discussed with respect to FIG. 1. However, the 16-bit values from the remaining XOR calculation chain sets would be of no use with respect to forming DIP4 parity word 135. This inefficiency is worsened as the value of n increases.
Accordingly, there is a need in the art for improved DIP parity word calculation techniques.