Programmable logic devices (“PLDs”) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (“FPGA”), may include an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (“IOBs”), configurable logic blocks (“CLBs”), dedicated random access memory blocks (“BRAMs”), multipliers, digital signal processing blocks (“DSPs”), processors, clock managers, delay lock loops (“DLLs”), and so forth. As used herein, “include” and “including” mean including without limitation.
Each programmable tile conventionally includes both programmable interconnect and programmable logic. The programmable interconnect may include a large number of interconnect lines of varying lengths interconnected by programmable interconnect points (“PIPs”). The programmable logic implements the logic of a user design using programmable elements that can include, for example, function generators, registers, arithmetic logic, and so forth.
The programmable interconnect and programmable logic are conventionally programmed by loading a stream of configuration data into internal configuration memory cells that define how the programmable elements are configured. The configuration data can be read from memory (e.g., from an external PROM) or written into the FPGA by an external device. The collective states of the individual memory cells then determine the function of the FPGA.
Another type of PLD is the Complex Programmable Logic Device, or CPLD. A CPLD includes two or more “function blocks” connected together and to input/output (“I/O”) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (“PLAs”) and Programmable Array Logic (“PAL”) devices. In CPLDs, configuration data is conventionally stored on-chip in non-volatile memory. In some CPLDs, configuration data is stored on-chip in non-volatile memory, then downloaded to volatile memory as part of an initial configuration (programming) sequence.
For all of these PLDs, the functionality of the device is controlled by data bits provided to the device for that purpose. The data bits can be stored in volatile memory (e.g., static memory cells, as in FPGAs and some CPLDs), in non-volatile memory (e.g., FLASH memory, as in some CPLDs), or in any other type of memory cell.
Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, e.g., using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable. For example, one type of PLD includes a combination of hard-coded transistor logic and a programmable switch fabric that programmably interconnects the hard-coded transistor logic.
Microprocessors are being embedded in Application Specific Integrated Circuits (“ASICs”), Application Specific Standard Products (“ASSPs”), and System-On-Chips (“SoCs”). These SOCs may be PLDs, such as FPGAs, that may contain one or more embedded microprocessors. Applications run exclusively on an embedded processor ties up the processor and thus does not have the advantage of off-loading tasks to a coprocessor. Alternatively, a coprocessor unit may be implemented in FPGA programmable resources (“FPGA fabric”) and coupled to an embedded microprocessor for off-loading tasks to the coprocessor. The term “coprocessor” as used herein means a coprocessor instantiated in whole or in part in programmable logic resources.
A conventional microprocessor core embedded in dedicated hardware of an FPGA may include multiple pipelines. These pipelines may be relatively independent from one another. For example, one pipeline may be for executing an instruction and another pipeline may be for accessing data from cache. An auxiliary processor unit (“APU”) controller may be coupled to a pipeline of such an embedded microprocessor. An example of an APU controller is described in U.S. Pat. No. 7,243,212 B1, which is incorporated by reference herein in its entirety for all purposes.
Heretofore, an APU controller executed one instruction at a time in order. Thus, an instruction provided to a microprocessor targeted for an auxiliary coprocessor coupled via an APU controller had to be completely executed by both the coprocessor and the APU controller before another instruction for such coprocessor could be passed to the APU controller for execution by the coprocessor. Thus, back-to-back APU instructions provided to a microprocessor meant that the latter of such instructions would be stalled until complete execution of the earlier of such instructions. This stalling of the microprocessor occurred even if the subsequent instruction of such instructions was for processing via a different pipeline of such microprocessor than the earlier of such instructions. Accordingly, back-to-back transactions could not be processed without at least one wait state, namely at least one “dead” microprocessor system clock cycle, between such transactions.
Moreover, heretofore out-of-order execution was not supported. Thus, even if a microprocessor having multiple pipelines supported out-of-order execution, out-of-order execution of instructions provided to an APU controller was not supported, namely would stall the microprocessor.