The present disclosure relates generally to integrated circuits, such as field programmable gate arrays (FPGAs). More particularly, the present disclosure relates to efficiently utilizing instantiated hardware implemented on the integrated circuit (e.g., an FPGA).
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Integrated circuits (ICs) take a variety of forms. For instance, field programmable gate arrays (FPGAs) are integrated circuits that are intended as relatively general-purpose devices. FPGAs may include logic that may be programmed (e.g., configured) after manufacturing to provide any desired functionality that the FPGA is designed to support. Thus, FPGAs contain programmable logic, or logic blocks, that may be configured to perform a variety of functions on the FPGAs, according to a designer's design. Additionally, FPGAs may include input/output (I/O) logic, as well as high-speed communication circuitry. For instance, the high-speed communication circuitry may support various communication protocols and may include high-speed transceiver channels through which the FPGA may transmit serial data to and/or receive serial data from circuitry that is external to the FPGA.
In ICs such as FPGAs, the programmable logic is typically configured using low level programming languages such as VHDL or Verilog. Unfortunately, these low level programming languages may provide a low level of abstraction and, thus, may provide a development barrier for programmable logic designers. Higher level programming languages, such as OpenCL have become useful for enabling more ease in programmable logic design. The higher level programs are used to generate code corresponding to the low level programming languages. Kernels may be useful to bridge the low level programming languages into executable instructions that may be performed by the integrated circuits. Accordingly, OpenCL programs typically require at least a single hardware implementation for each kernel in the OpenCL program. In many cases, pipelining may enable more efficient execution by dividing processes into stages (e.g., a single instruction). Dividing the processes into stages may enable parallel processing by allowing new data to enter a stage immediately upon completion of processing prior data at the stage.
In traditional systems, when a loop is present, a counter typically tracks a number of data sets that are allowed to enter the loop. Maximum loop occupancy is set based upon the minimum number of stages on a branch within the loop. For example, when a loop has two branches, one with five stages and another with three, the maximum loop occupancy may be set to three, because allowing a fourth data set to enter the loop on the three-stage branch may cause a stall. Unfortunately, this approach limits throughput, because it could be possible that each branch in the loop body could accept more data. Accordingly, process efficiency is diminished using the maximum loop occupancy approach.