Over time, computational system capabilities change along with the capabilities of their processors. For example, different processor generations can have different clock speeds, cache sizes, instruction sets, and other capabilities. Many of these capabilities can only be exploited by applications that are written and/or compiled to use them. However, optimizing applications to use newer processor capabilities can often limit the compatibility of those applications with older processors that do not have those capabilities. Accordingly, many of those newer processor capabilities are slow to be fully utilized.
Various techniques exist for exploiting newer processor capabilities while maintaining backwards compatibility with older processors. One such technique involves dynamic compilation, such as in the JAVA programming language, which effectively generates the compiled application code (e.g., the “binary”) at execution time according to the capabilities of the system on which the application is being compiled. While this can exploit the capabilities of the target system, there are also a number of limitations, including appreciable overhead and risk in performing the compilation at runtime. Another such technique involves compiling the application code to link to platform-specific libraries, which can include versions of common functions (e.g., matrix multiply) for different types of platforms. This technique is often limited to accelerating only a small set of computations (e.g., those involving particular common functions). Yet another such technique involves compiling the entire application code for each of multiple platforms and stitching the resulting binaries together into one large file. This technique, sometimes referred to as “fat binaries,” typically results in very large files and its use tend to be limited to providing compatibility among distinct instruction set architectures (ISAs), rather than among variations within a given ISA.