Several emerging applications require sustained battery-less operation based on scavenging energy. A very important domain for this consists of in-vivo biomedical applications that execute complex biomedical analysis of sensor data, in order to produce higher-level information that can then be transmitted to a centralized emergency/info gathering service. On the longer term, the use of this information can even be envisaged to directly decide on triggering on-line activation of recovery means like submitting specific medicine quantities in-vivo. The potential of this technology is enormous but several basic research challenges exist today that prevent people from realizing this ambitious vision. One issue is the complexity and robustness of reliable in-vivo biomedical analysis systems. In order to reduce the false positive rate in detectors of clinical events an extension from classical signal processing algorithms to non-stationary signals, and complex advanced adaptive filtering techniques (based upon multi linear algebra such as e.g. Singular Value Decomposition, Total Least Squares, Principle Component Analysis and Independent Component Analysis) are needed.
It is crucial that an intelligent sensor system as e.g. indicated above stays below the scavenging energy limit of about 50 μW continuous supply. In order to provide sufficient algorithmic flexibility and easy updates after the implant has been put inside the body, the market would strongly prefer a quite programmable or configurable platform. Using state-of-the-art architecture styles and instances would lead to an energy budget problem that cannot be overcome. The relationship between energy and task is relevant for these domains, and can be expressed in MIP/mJ or MIPS/mW, where MIP can be defined as Million RISC Instructions and MIPS can be defined as Million RISC Instructions Per Second. The rough estimate for such an intelligent system running a seizure detection like algorithm under the given power constraint is about 1000 to 10000 MIPS/mW.
The power breakdown for a representative signal processing application based on such VLIW-DSP templates shows that now the data register-file (also called foreground memory) and also the level-1 data and instruction memories form the main bottlenecks. Similar studies have been done for mobile multimedia terminal applications. The requirements in that case are 10000 MIPS and the limit on the power consumption would be about 300 mW. So the MIPS/mW figure should also exceed 1000 MIPS/mW. Also other low power applications will benefit from the processor architecture according to the present invention.
VLIW (Very Long Instruction Word) architectures execute multiple instructions per cycle, packed into a single large “instruction word” or “packet”, and use simple, regular instruction sets. However, even the most power-efficient ASIPs (Application-domain Specific Instruction-set Processors) today that are based on VLIW DSP (Digital Signal Processing) templates arrive only at about 50 MIPS/mW. A huge gain of about a factor 20 to 200 is thus required.
A traditional design strategy of ASIPs consists of the two steps: (1) the design of the basic processor with its basic instruction set, and (2) the design of the custom instructions with their corresponding custom hardware. Tensilica's Xtensa as described by R. E. Gonzalez in “Xtensa: A configurable and extensible processor”, IEEE Micro, 20(2), 2000; and HP's and STMicroelectronics' Lx as described by P. Faraboschi et al. in “Lx: a technology platform for customizable VLIW embedded processing”, Proc. of ISCA, 2000, are some of the presently commercially available ASIPs.
At the compiler end still not much is available either. Tools like Target Compiler's Chess framework or Coware's LisaTek allow the design of the custom instruction set and the hardware required by these instructions. But they do not improve the energy consumption a great deal. Academic research in the design of ASIPs has focused on the problem of identification and implementation of an efficient set of instruction set extensions. Examples thereof are described by P. Biswas, V. Choudhary, K. Atasu, L. Pozzi, P. Ienne and N. Dutt in “Introduction of Local Memory Elements in Instruction Set Extensions”, Proceedings of DAC, June 2004, pp-729-734; by P. Yu and T. Mitra in “Characterizing Embedded Applications for Instruction Set Extensible Processors”, Proceedings of DAC, June 2004, pp-723-728; and by P. Yu and T. Mitra in “Scalable Instructions Identification for Instruction Set Extensible Processors”, Proc of CASES, September 2004. Although most of the work has focused on improving the performance, not much work has been done specifically in the area of reducing energy consumption. J. Lee, K. Choi and N. D. Dutt do present, in “Energy-Efficient Instruction Set Synthesis for Application-Specific Processors”, Proc of ISLPED, August 2003, a way to extend the instruction set based on the energy-efficiency of the new instructions.
Most energy efficient techniques that are currently used, reduce the power consumption of ASIPs, but do not attack the core bottleneck of the power problem viz. the instruction memory hierarchy and the register file.
The power consumption of the register file is a growing problem as stated by J. L. Ayala, M. L. Vallejo, A. Veidenbaum and C. A. Lopez in “Energy Aware Register File Implementation through Instruction Precode”, Proc of ASAP, 2003. This is because of the trend towards highly parallel architectures which impose a large port requirement on the register file. FIG. 1 plots the energy consumption per access of a 32-bit register file with respect to the number of read and write ports. It can be clearly seen that, as the number of ports increases, the energy/access increases drastically. The authors in cited document address the problem of reducing the energy consumption in processors by utilizing a hardware based approach by turning unused registers into low power states.
N S. Kim and T. Mudge, in “The Microarchitecture of a Low Power Register File”, Proc of ISLPED, 2003, pp-384-389, also highlight the problem of the register file and introduce a technique that reduces the register file power consumption, but with a loss in performance.