The present invention relates generally to improvements in parallel processing, and more particularly to techniques for the loading of very long instruction word (VLIW) memory in an indirect VLIW processor which allows the load latency to be hidden by computation.
An indirect VLIW processor is organized with a VLIW memory (VIM) that is separate from its short instruction word (SIW) memory. The VIM is defined for an instruction set architecture by way of specific SIWs that control the loading and execution of the VLIWs stored in the VIM. For example, a load VLIW (LV) instruction is defined which acts as a setup control delimiter instruction for the processor logic. The LV instruction specifies the VIM address where a VLIW is to be stored and the number of SIWs which follow the LV instruction that are to be stored at the specified VIM address in VLIW fashion. Another special SIW is the execute VLIW (XV) instruction. The XV instruction causes a VLIW to be read out of VIM at the XV specified address.
The ManArray processor defines two preferred architectures for indirect VLIW memories. One approach treats the VIM as one composite block of memory using one common address interface to access any VLIW stored in the VIM. The second approach treats the VIM as made up of multiple smaller memories each individually associated with the functional units and each individually addressable for loading and reading during XV execution. It will be recognized that improved techniques loading of VLIW memory will be highly desirable.
The present invention covers techniques to independently load the VIMs concurrent with SIW or iVLIW execution on the SP or on the PEs thereby allowing the load latency to be hidden by the computation. The ManArray processor which is a scalable indirect VLIW array processor is the presently preferred processor for implementing these concepts. The VIM memories, contained in each processing element (PE), are accessible by the same type of LV and XV SIWs as in a single processor instantiation of the indirect VLIW architecture. In the ManArray architecture the control processor, also called the sequence processor (SP), fetches the instructions from the SIW memory and dispatches them to itself and the PEs. By using the LV instruction, VLIWs can be loaded into VIMs in the SP and the PEs. Since the LV instruction is supplied by the SP through the instruction stream, when VLIWs are being loaded into any VIM no other processing takes place. In addition, as defined in the ManArray architecture, when the SP is processing SIWs, such as control and other sequential code, the PE array is not executing any instructions. With the techniques presented herein, the latency to load the VIM can be hidden by the computation.