This invention pertains to the field of VLIW (Very Long Instruction Word) microprocessors and digital signal processors (DSP). This type of processor is characterized by the capability to utilize a high degree of instruction level parallelism (ILP), if this is available in the current application. Software applications developed for these processors exhibit a varying degree of latent ILP. Instruction level parallelism is the degree to which instructions are independent of each other and can be performed in parallel rather than in series. Applications with low latent ILP cannot exploit the full capabilities of a VLIW processor at all times. It often happens that the performance over a period of execution having lower latent ILP could be improved if task-level parallelism (TLP) were available as an alternative. Task-level parallelism refers to the capability of the processor to perform more than one independent task or instruction thread simultaneously. Such independent tasks typically include at least dozens or hundreds of instructions directed to independent problems. This greater utilization of the VLIW processor comes from the fact that during a period of low latent ILP many of the functional units of the processor are idle and thus could be used to run code from another task (or thread).
VLIWs are highly effective for regular, loop-oriented tasks such as are typical of the performance-sensitive aspects of digital signal processing and other "number-crunching" applications. Many modern applications require that one processor serve a mixture of programming paradigms. For example, a real-time embedded DSP application mixes both DSP and control processing tasks. The latter tasks typically have little latent ILP. Multi-thread execution would better suit the application when it is not solely involved in time-critical DSP kernel inner loop execution.
This problem has been addressed in a number of ways. One example of the prior art is the VLIW approach to multithreading as shown in U.S. Pat. No. 5,574,939 entitled "MULTIPROCESSOR COUPLING SYSTEM WITH INTEGRATED COMPILE AND RUN TIME SCHEDULING FOR PARALLELISM" by Keckler et. al. Keckler et. al. shows a VLIW system that can execute multiple threads that have been intermixed at compile time into a single VLIW word. In this approach a number of different instruction streams, which would have needed separate program counters, are statically scheduled together and run as a single combined instruction stream under control of a single program counter.
For superscalar-processors (which can be considered a form of hardware assembled VLIW system) running multiple time interleaved code streams is proposed. First, fetch N instructions from stream A, then N instructions from B, or fetch from A until processing stalls then switch to B until processing stalls, etc. An example of this type of processor operation is shown in U.S. Pat. No. 3,771,138 entitled "APPARATUS AND METHOD FOR SERIALIZING INSTRUCTIONS FROM TWO INDEPENDENT INSTRUCTION STREAMS" by Celtruda, et. al. Celtruda, et. al. teaches a processor with dual instruction buffers that are executed from in a time multiplexed fashion. U.S. Pat. No. 4,320,453 entitled "DUAL SEQUENCER MICROPROCESSOR" by Roberts, et. al., also teaches multi-threading by time multiplexing a processor. The primary VLIW machine patent from Denelcor, the HEP patent U.S. Pat. No. 4,229,790 entitled "CONCURRENT TASK AND INSTRUCTION PROCESSOR AND METHOD" by Gilliland, et. al. is also based on time multiplexing.
In addition a VLIW processor is particularly ill suited to interrupts. Interrupt handlers exhibit little ILP and switching the processor state to interrupt a VLIW's execution is costly and slow. Further, interrupt routines normally run only for a very short time. Thus a VLIW processor typically wastes resources handling interrupts.