Co-processors have been used to accelerate computational performance. For example, some early microprocessors did not include floating-point circuitry due to integrated circuit die area limitations. As used herein, “include” and “including” mean including without limitation. Unfortunately, performing floating-point computations in software can be quite slow.
Accordingly, a co-processor configured to work with a microprocessor was created. Instructions for the co-processor could thus be passed through the microprocessor, such as for performing a floating-point computation for example. As integrated circuit technology improved, microprocessor and co-processor were combined together in a single die. So, for example, some recent microprocessors are capable of performing floating-point operations.
Still, conventional microprocessors have a fixed set of circuitry for carrying out instructions from their Instruction Set Architecture (“ISA”). So while instructions from an ISA may be used for carrying out computational algorithms in a conventional microprocessor, the execution of such instructions is limited to the fixed set of circuitry of the microprocessor. In short, microprocessors may not be well suited for carrying out some complex algorithms or highly specialized algorithms, and thus execution of such algorithms as program applications using a microprocessor may be slow.
More recently, multi-microprocessor computing systems have been implemented. In such systems, one microprocessor may act as a Central Processing Unit (“CPU”) and one or more others of such microprocessors may act as auxiliary processors to improve computational throughput. However, such microprocessors are still limited to their fixed set of circuitry and associated ISA, and thus may still be relatively slow when executing complex algorithms or highly specialized algorithms.
A microprocessor interface conventionally has more available pins than an edge connector associated with a peripheral circuit board interface. Conventionally, a socket may be attached to a microprocessor interface of a motherboard to facilitate addition of a microprocessor, which may be added after manufacture of the motherboard. Thus, in some instances, motherboards are sold separately from microprocessors.
Programmable Logic Devices (“PLDs”), such as those that have field programmable gates which may be arrayed as in Field Programmable Gate Arrays (“FPGAs”) for example, have programmable logic that may be tailored for carrying out various tasks. For purposes of clarity by way of example and not limitation, FPGAs are described below; however, it should be understood that other integrated circuits that include programmable logic, such as field programmable gates, may be used.
Execution of complex algorithms or highly specialized algorithms may be done in hardware via programmable logic tailored to carry out such algorithms. Executing of complex algorithms or highly specialized algorithms instantiated, in whole or in part, in programmable logic may be substantially faster than executing them in software using a microprocessor or microprocessors.
However, motherboards or system boards capable of handling one or more microprocessors are more common in computing systems than PLDs, such as FPGAs for example, for a variety of known reasons. Accordingly, some developers have created FPGA accelerators implemented as expansion cards that plug into one or more peripheral circuit board edge connection slots of a motherboard. However, expansion board FPGA accelerators (“peripheral accelerators”) are limited by the edge connection interface pin density and associated performance of the peripheral communication interface with which they interconnect. An example of a peripheral interface is a Peripheral Component Interface (“PCI”). A peripheral circuit board interface, such as a PCI for example, is relatively slow as compared with a microprocessor interface. Examples of microprocessor interfaces include a Front Side Bus (“FSB”) and a HyperTransport (“HT”) link, among other types of microprocessor interfaces.
A configuration bitstream or a partial bitstream may be pre-designed to provide one or more functional blocks when instantiated in programmable logic. Such a pre-designed bitstream or partial bitstream is conventionally derived from what is generally referred to as a “core.” For example, an HT link core is available from Xilinx, Inc. for providing a configuration bitstream that may be instantiated in an FPGA from that vendor. Conventionally, a core is usable in a variety of applications; however, a core may include pre-defined placement or pre-defined routing, or a combination thereof. These types of pre-designed cores are sometimes known as “floor-planned” cores. Such floor-planned cores may be pre-designed for a particular family of products. Additionally, cores may allow a user to enter parameters to activate functionality, change functionality, and adjust interface parameters, among other known parameterizations.