The core of a programmable circuit, such as a field-programmable gate array (FPGA), includes programmable logic components, such as lookup tables (LUTs) and flip-flops, as well as “hard” components, such as IOs, memories, and DSP blocks. Conventionally, these circuits have been used to build cycle-accurate pipelined machines. As performance targets increase, these machines become more and more pipelined in order to meet the increased operating frequency (i.e., decreased cycle time) requirements. Simultaneously, the time it takes a signal to traverse a fixed proportion of a chip increases compared to the cycle time, not only because the cycle time is decreasing, but also because process scaling is not delivering wires that fully follow the process scaling curve.
It has become increasingly desirable to pipeline transmission time into segments, as well as decompose logic clouds into smaller and smaller pipelined pieces. Unfortunately, this additional comprehensive conventional rigid pipelining is not fully compatible with another increasing trend in system design, namely the proliferation of subcomponents running at different effective data rates, which requires handshaking between the subcomponents to keep the separate data rates matched. Such matching may be performed with a conventional FIFO. However, such a “heavyweight” solution may not always be desirable.