Many memory devices include combination logic to perform discrete operations and/or calculations within the memory device itself. In traditional memory devices, the combination logic may be broken down into stages. The stages may be implemented within a chain of flip-flops (e.g., D flip-flops). Each flip-flop is commonly clocked, so the combination logic between each flip-flop in the chain must complete before the next clock cycle. That is, a first flip-flop detects a clock edge and provides a data output to the combination logic. The combination logic performs the calculation and provides its output to the data input of the next flip-flop in the chain before the next dock edge. One drawback to the chained flip-flop architecture is that time is wasted for every flip-flop stage because each flip-flop has an associated setup time and a time period between the detection of the clock signal edge and delivery of the data input to the output terminal. This time is characteristic of the flip-flops and therefore unavoidable in traditional chained flip-flop architectures.
One example of combination logic is a command parity error calculation, such as that performed in double data rate 4 (DDR4) memory devices. An example command parity calculation includes a five stage XOR tree that operates on parity data provided with a command to a memory device. The parity error calculation is performed within a defined parity latency period that specifies the number of clock cycles during which the parity error calculation must be completed. Therefore, the parity latency determines the number of flip-flops in the chain, as well as the number of logic stages that must be completed between each flip-flop. For example, if the parity latency is 5 clock cycles (e.g., the result of the calculation must be available at the fifth rising clock edge) and the XOR tree has five stages, then two stages must be completed during one clock period between adjacent flip-flops in the chain. Because there are only four clock periods to perform five stages worth of calculations, one of the clock periods must double up and two stages must be calculated during the clock period to ensure that the calculation is completed within the parity latency period. In general, if the latency is N cycles, then the calculation is performed in N−1 cycles so that the result is available for output on the Nth cycle.
Alternatively, the parity latency may be greater than the number of stages. For example, if the parity latency is set to 8 clock cycles, then seven clock periods are available, but only five stages of calculation are needed. In this scenario, the parity calculation cannot take advantage of the additional clock periods available for calculation, and the result of the calculation is simply passed from flip-flop to flip-flop for the last two clock cycles.