The invention relates to a processor architecture, said processor architecture comprising a first plurality of data processing elements, which are interconnected in operative connection by means of a data link structure, and is particularly concerned with improving the use of resources in a multi-threaded VLIW (Very Long Instruction Words) instruction processor for variable word length.
In the present context, a processor designates an arbitrary information processing device, which is adapted to process one or more instruction sequences.
In the present context, a processor instance designates a virtual processor which processes an instruction sequence or thread. A processor instance is created by suitably configuring available hardware modules and may be deleted or destroyed by means of reconfiguration.
In the present context, a configuration designates a number of data bits, which is stored in a sequential logic system and which affects its behaviour.
In the present context, the term “reconfigurable” denotes that the configuration of a sequential logic system can be changed at runtime.
The invention may be of particular importance for so-called “embedded systems”. An embedded system is a computer system designed to perform one or a few dedicated functions often with real-time computing constraints. It may also be referred to as a special-purpose computer or special-purpose processor (SPP). It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal computer, or a general-purpose processor (GPP) is designed to be flexible and to meet a wide range of end-user needs. Embedded systems control many devices in common use today. The multiplicity of applications that may potentially be executed on next generation embedded systems will not only lead to a more diversified processing behaviour. Even more, their dynamic runtime characteristics will also lead to a hardly predicable behaviour which makes state-of-the-art embedded systems, composed of GPPs and SPPs, behave inefficiently as far as costs, performance, power and time-to-market constraints are concerned. This stimulates the need for a novel, innovative architecture that is able to efficiently cope with an increased diversified processing behaviour by dynamically adapting, at runtime, hardware resources in view of data-flow characteristics and processing type much more flexible than state-of-the art approaches.
Applications of different types or parts of said applications are executed with different efficiencies on different processors as far as, e.g., energy consumption, use of resources, and duration of execution are concerned. The above-mentioned GPPs perform well for a large variety of applications, e.g., X86 processors for desktop general-purpose computers. SPPs, on the other hand, are designed to perform well only with small number of highly specialized applications. The present invention is particularly aimed at reducing the gap between GPPs and SPPs in order to provide a processor architecture which can be flexibly employed to perform both general-purpose and special-purpose tasks with high efficiency.
The prior art on the technical field of increasing processor efficiency and flexibility comprises a technique known as parallel instruction execution. With this technique, two or more instructions or operations are executed simultaneously by means of parallel processing in order to increase processor performance.
U.S. Pat. No. 4,833,599 discloses an example of parallel instruction execution. In particular, said document discloses a processor which executes a plurality of instructions simultaneously by using VLIWs. Although the number of individual instructions, which may be comprised in a single VLIW instruction, can be as high as 20 or more, the length of the VLIW instruction words is fixed, thus greatly reducing the flexibility of use of the known processor.
Program code to be executed by a given processor may sometimes be split into smaller components, which are called threads. A thread is a sequence of instructions, the execution of which achieves a given task or result. For instance, in a video conferencing application the processor could be called for executing code for treating audio and video data. There could be separate code sequences, an execution of which is devised for separately treating said audio and said video data, respectively. Thus a first thread would comprise instructions for treating video data, and a second thread would comprise instructions for treating audio data. In other words, a thread is an independent program typically associated with a thread identifier, and during execution in a hardware multi-threaded environment, an architectural state of the processor core executing one thread may be maintained while instructions are being executed by another thread.
U.S. Pat. No. 5,890,008 discloses a system and a method for dynamically reconfigurating a processor between single-processor and selected multiple-processor configurations. Said document teaches a processor architecture which is devised to adapt the processor hardware in such a way as to support a plurality of applications executed in parallel on a single processor chip. The processor of U.S. Pat. No. 5,890,008 can be dynamically reconfigured so that it presents one or more virtual processor units or processor elements (so-called “strands”). Said document further describes various types of processor units, e.g., an instruction calling unit, an instruction renaming unit, an instruction scheduling unit, and an instruction execution unit, as well as further units. Said units may comprise pipeline stages, or may constitute a pipeline together. Further disclosed is an execution of instructions, which belong to different threads. To this end some of said units comprise memory locations for storing thread identifications in connection with said instructions from different threads. However, at least some execution resources are shared in a time-multiplexed manner between strands and are not exclusively available for one strand, e.g., by replication of the resources. Therefore, disadvantageously, the performance of one thread executed by a strand varies depending on the number of strands.