Configurable platforms are being deployed in data centers and are promising candidate architectures for accelerating certain classes of workloads. However, these configurable platforms typically only outperform graphics processing units (GPUs) and other types of compute-dense processing units for certain types of calculations (those with irregular data flows, computations on non-standard bit-width data types, etc.). In order to address this drawback, field-programmable gate array (FPGA) vendors have started incorporating hardened logic, hardcoded logic blocks, or hardened compute blocks (collectively “hardened logic blocks”) for a number of compute element types including central processing units (CPUs), floating point (FP) units and the like in the FPGAs. However, this requires the FPGA vendors to produce a wide variety of devices with different mixes of hardcoded logic blocks for various market segments. Even with such hardcoded logic blocks, the efficiency of the configurable platform is sub-optimal for applications that have sections of code that can benefit from dense compute engines such as GPUs. A conventional technique incorporates multiple discrete devices of each kind in the system at a board level (e.g., a CPU and a GPU along with the FPGA). This technique increases system-level cost and complexity as multiple discrete devices need to be incorporated and coordinated. Another conventional technique incorporates hardened logic blocks on the FPGA device. This technique requires the manufacture of many different devices with varying mixes of hardened logic block types and still runs the risk of not being optimal for any particular workload.