The present invention relates generally to compiled instructions, and, more particularly, to dynamically selecting compiled instructions for execution.
A compiler is a specialized computer program that converts source code written in one programming language into another language, usually machine language (also called machine code), so that it can be understood by processors (i.e., logic chips). Source code is the version of software (usually an application program or an operating system) as it is originally written (i.e., typed into a computer) by a human in plain text (i.e., human readable alphanumeric characters). Source code can be written in any of numerous programming languages, some of the most popular of which are C, C++, Java, Perl, PHP, Python and Tcl/Tk. The output of a compiler is referred to as object code.
Compilers create programs that are optimized to target the processors and the fixed functions of their architecture. However, the architecture may be tuned very well for one application type, but will penalize others. Current architectures are optimized around the most typical coding sequences, or worst, towards benchmarks used in market comparisons. As a result, the ability to create optimum performance for multiple instruction sequence types is too broad an endeavor for current architecture and compiler methods.
Previous architectures had a fixed structure. The performance of fixed architectures can be very restrictive with their static execution units. It is nearly impossible for a generalized fixed architecture to be ideal for all problems. Custom execution units are not ideal due to their limited usefulness, chip area and power consumption.
Dynamic compilation is a process used by some programming language implementations to gain performance during program execution. The best known language that uses this technique is Java. Dynamic compiling originated in Self. It allows optimizations to be made that can only be known at runtime. Runtime environments using dynamic compilation typically have programs run slowly for the first few minutes, and then after that, most of the compilation and recompilation are done and the program runs quickly. However, due to this initial performance lag, dynamic compilation is undesirable in certain cases. In most implementations of dynamic compilation, some optimizations that could be done at the initial compile time are delayed until further compilation at runtime, causing further unnecessary slowdowns.
Attempts have been made to improve dynamic compilation. For example, just in time compilers have been developed that compile from architecture independent code (Java Byte code) into architecture dependent application based solely on the history of execution at runtime. This type of compiling performs optimization based on the target architecture.
Tensilica compilers create custom logic based on application specific needs to solve a particular problem, like an Application Specific Integrated Circuit (ASIC). Software routines are mapped to hardware macros through a tool. This optimization yields higher performance but only for a fixed problem domain.
Transmeta compilers convert Intel x86 code into internal VLIW instruction architecture format, recompiling often used parts of the code for the best optimization. It then replaces the translated code with optimized translated code based on historical usage patterns. Since the code is replaced, the previous code cannot be used when circumstances change, and the optimized code is no longer optimal.
Field Programmable Gate Arrays (FPGAs) have been used historically by hardware design engineers to design, validate, and test circuitry as an intermediate step, ultimately targeting the design for use in an ASIC, such as a custom digital signal processor (DSP) or other special purpose chips. ASICs are fast and highly specialized, and thus very efficient. However, they are very costly to bring to market, and thus are usually used in mass-market applications. For the past twenty years, text based hardware design languages (HDLs), such as VHDL and Verilog, have been used for designing, or programming such custom circuitry. FPGAs have had much slower clock speeds than processors, and thus were never originally intended for use as processing elements themselves.
Over the years, FPGAs have been catching up to processors, and have outstripped Moore's law, becoming much denser, faster, and cheaper at a much faster rate than microprocessors. In fact, today the majority of designs for custom circuitry can now remain on an FPGA for execution instead of going through the long and expensive process of bringing a custom ASIC to market.
Although the C based FPGA programming environments today can facilitate an application programmer inputting circuitry (Cores) into FPGAs by making simple redirected function calls to place these cores in the FPGA based hardware, they were never really designed as parallel hardware design languages for creating optimal cores. In particular, current C based FPGA programming techniques are not suited for creating complex designs. C and C++ were never actually designed to do parallel programming in reconfigurable FPGA hardware and mix them with hardware design languages, such as VHDL. Using current C based FPGA programming techniques to accomplish tasks that they were never designed to do can produce an awkward and challenging experience for programmers. These solutions are sufficient for placing cores and simple single chip designs into a single FPGA, but in order to maximize performance with larger parallelized applications in a deep-scaling environment (including, e.g., multiple FPGA designs, and multiple FPGA board to board designs), these tools will need to greatly evolve their capabilities.
With current compiler architecture, it is nearly impossible for a generalized fixed architecture to be ideal for all problems. Custom execution units are not ideal due to their limited usefulness, chip area, and power consumption. The choice of execution units would ideally be a dynamic choice that is based on the current state of the execution unit at the time of execution.