Some conventional programmable processors include dedicated memory blocks embedded in their programmable logic arrays to increase performance of the processor-oriented functions. Normally, these memories are intended to implement a wide range of functions, and therefore, are embedded into the arrays of programmable logic without adaptation. While functional, this general approach to implementing memory blocks in programmable logic arrays has several drawbacks.
The architectures of most well known programmable logic arrays tend to under-utilize the capabilities of the embedded memory blocks. Also, these arrays generally lack the control mechanisms as well as the data paths that are necessary to rectify the deficiencies in using the embedded memory blocks. To illustrate, consider that a register file (“RF”) in a load-store based architecture normally maintains data, such as filter coefficients, that is subject to reuse during repetitious computations. Consequently, the one or more registers holding that data are deemed restricted, and thus are inaccessible for use by other computations. This stalls the completion of the other computations until the registers become available, or, the contents of the registers are jettisoned and then reloaded by performing multiple load and store instructions. This hinders processor performance by increasing instruction processing time and by consuming bandwidth in the data buses
Memory blocks are also under-utilized because data paths do not efficiently introduce data efficiently into those memory blocks. Inefficiencies of loading data into embedded memory arise because reconfigurations of the programmable logic array are typically performed in series with the execution of instructions rather than in parallel. In addition, most known programmable processor architectures lack an efficient path over which to exchange input/output (“I/O”) data streams between a peripheral device and the embedded memory blocks, other than by interrupting the continuous streams of I/O data (and a processor) to temporarily store the I/O data streams until the data can be discretely copied from external main memory to its destination. Also, there are generally no provisions to load I/O data into a memory block while an instruction is being executed in adjacent logic.
Further, scarce programmable resources that might be otherwise used to perform computations are usually reserved for interfacing the embedded memory blocks with the functionalities of the programmable logic. To implement “double buffering,” for example, programmable resources must be dedicated to synthesize circuitry (e.g., multiplexers, etc.) to implement the swapping of buffers. Consider, too, that wide Boolean function implementations inputs) look-up tables (“LUTs”). But wide Boolean functions do not generally map efficiently to these small-sized LUTs.
Thus, there is a need for a system, an apparatus and a method to overcome the drawbacks of the above-mentioned implementations of embedded memory in traditional programmable logic arrays, and in particular, to effectively use embedded memory to increase processor performance and to preserve reconfigurable computation resources.