In many sliced designs for an ALU, pipelining is used to improve performance. For example, in a two slice design, wherein each addend is sliced into a most significant slice and a least significant slice, an add operation can be performed in two cycles. In the first cycle, partial results are calculated for each slice of the addends. A sum and a carry-out are calculated for the least significant slice. For the most significant slice a partial sum and a partial sum+1 are calculated. These partial sums are produced both for carry-in and no-carry-in situations, i.e. the most significant slice partial sum is used when there is no-carry-in from the least significant slice and the partial sum+1 is used when there is a carry-in from the least significant slice. Likewise, a carry generate signal is produced and sent to the upper slice to indicate whether there is a carry-in. In the second cycle, the carry generate signal crosses the ALU from chip to chip to produce a carry-in signal to the most significant slice add operation. The carry-in signal selects which partial result from the previous cycle's most significant slice is to be output. Thus, the entire answer is complete in two cycles. Since the operations first performed in the two cycles are independant, the ALU can be pipelined. Although the addition takes 2 cycles, a new result is produced every cycle. However, for the most significant slice, this pipeline efficiency is lost whenever the result of one add operation is needed as an input to the next add operation.
Prior solutions involved the insertion of a no operation (NOP) instruction or an ALU operation on different data between the successive adds to delay the add until the previous result is available. However, there is often no useful work to be done in this period and therefore the cycle is wasted.