Programmable logic devices (PLDs) are a well-known type of programmable integrated circuit (IC) that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (FPGA), typically includes an array of programmable tiles. These programmable tiles comprise various types of logic blocks, which can include, for example, input/output blocks (IOBs), configurable logic blocks (CLBs), dedicated block random access memory blocks (BRAM), multipliers, digital signal processing blocks (DSPs), processors, clock managers, delay lock loops (DLLs), bus or network interfaces such as Peripheral Component Interconnect Express (PCIe) and Ethernet and so forth.
Each programmable tile typically includes both programmable interconnect and programmable logic. The programmable interconnect typically includes a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (PIPs). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are typically programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external programmable read only memory (PROM)) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Some programmable ICs include separate blocks of memory that can be programmably connected to implement a larger memory. For example, the UltraRAM (URAM) is a high-density FPGA 288-Kbit memory building block in the Xilinx UltraScale+ architecture. The 288-Kbit blocks are cascadable to implement deeper memories. Each URAM has dedicated built-in vertical cascade to create a column of URAMs. Several columns of URAMs can be connected via horizontal cascade circuitry to form a URAM matrix. Note that horizontal cascade can be implemented using lookup tables (LUTs) and flip-flops (FFs) of an FPGA. Several URAMs can be connected to implement deep memories using the cascade connections.
Logic delay accumulates as URAMs are cascaded vertically. Deep cascade structures can result in large clock-to-out delays for access to the memory. To ameliorate the logic delay and support a desired operating frequency, each URAM has built-in pipeline registers that can be programmably enabled. Therefore, achieving optimal pipeline packing is important for high speed memory access.