Programmable logic devices (“PLDs”) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (“FPGA”), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (“IOBs”), configurable logic blocks (“CLBs”), dedicated random access memory blocks (“BRAMs”), multipliers, digital signal processing blocks (“DSPs”), processors, clock managers, delay lock loops (“DLLs”), and so forth. Notably, as used herein, “include” and “including” mean including without limitation.
One such FPGA is the Xilinx Virtex® FPGA available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124. Another type of PLD is the Complex Programmable Logic Device (“CPLD”). A CPLD includes two or more “function blocks” connected together and to input/output (“I/O”) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (“PLAs”) and Programmable Array Logic (“PAL”) devices. Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, for example, using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable.
For purposes of clarity, FPGAs are described below though other types of PLDs may be used. FPGAs may include one or more embedded microprocessors. For example, a microprocessor may be located in an area reserved for it, generally referred to as a “processor block.”
A more recent addition to FPGA architecture is the inclusion of an Auxiliary Processor Unit (“APU”). The APU provides a high-bandwidth interface between programmable logic of an FPGA and an embedded processor of the FPGA. Although, an APU is generally intended for coupling a co-processor to the embedded processor, it may be used for other applications.
In the APU provided in Virtex-4 FPGAs available from Xilinx, Inc. of San Jose, Calif., information may be read from or written to cache memory via an embedded processor and the APU of the FPGA. All types of information, such as addresses, data, instructions, control signals, and the like, are referred to hereinafter as data for purposes of clarity and not limitation. In an example, cache memory may be read a wordline at a time where each wordline is 128 bits long, namely a quadword at a time where each word is 32 bits long. However, there may be situations where data to be obtained from cache memory is not quadword-aligned.
A limitation of the APU interface is that high-bandwidth operation generally requires data to be quadword-aligned. Thus, the APU may indicate that a high-bandwidth operation is invalid if data is not quadword-aligned, or the APU may transfer the data from an incorrect location in memory. However, using embedded processor instructions to pre-align data consumes embedded processor cycles and complicates instruction programming.