Various bit-serial multiplier circuit topologies may consume a large number of flip-flops, which generally translates into a large physical footprint when such circuits are implemented using a Field-Programmable Gate Array (FPGA) device. This can occur because “slices” of the FPGA device where a flip-flop is used, but where accompanying combinational logic may not be required, are deemed “occupied” and the associated look-up tables (LUTs) on such slices may not then otherwise be available for use. Additionally, some circuit topologies may consume a large area due to a larger number of control sets. Generally-available serial multiplier implementations can be inefficient in terms of a number of clock cycles consumed per multiply operation, or such approaches may not start generating a partial result immediately (each of these considerations can pose a serious computational bottleneck). Another approach can include use of a hybrid bit-serial/parallel multiplier implementation, but such an approach can be inefficient in terms of consuming input/output facilities on the FPGA device, such as consuming excess pins on the device package.