Mobile electronic devices (e.g., cellular phones, watches, headphones, remote controls, etc.) have become more complex than ever, and now commonly include multiple processors, system-on-chips (SoCs), and other resources that allow mobile device users to execute complex and power-intensive software applications (e.g., video streaming, video processing, etc.) on their mobile devices. With this rise in complexity and power consumption, new and improved processing technologies that better utilize the mobile device's resources and capabilities are beginning to emerge.
These emerging technologies include systems capable of compiling code that is designed for execution on a general purpose applications processor so that the code is suitable for execution on an auxiliary processor, such as a digital signal processor (or DSP). In particular, an application program may be partitioned into units or chunks, and the units/chunks may be distributed to different processing components based on the identified efficiencies/capabilities of the processing components (e.g., a DSP, graphics processing unit or GPU, etc.). This allows the main or central processing unit (CPU) or applications processor to offload some of its operations to an auxiliary processor to conserve power and/or improve performance.
However, determining how the application program is to be partitioned, and which partitions are best suited for execution on an auxiliary processor is often a difficult design task. That is, offloading operations to an auxiliary processor may improve the performance and power consumption characteristic of the mobile device so long as there is an efficient way to recognize and partition a given code segment into components that are well suited for execution in different types of cores or processing units.
Existing technologies may utilize different techniques for identifying and/or processing code. Some techniques may utilize automatic code partitioning and may represent application code by program dependence graphs for partitioning the code using inherent parallelism and known communication costs. These techniques do not utilize predefined patterns that may be known to benefit particular processing units, such as a digital signal processor (DSP). Other techniques may detect idioms (or known/predefined sets of instructions) within code (or binaries) and replace the idioms with hardware-assist instruction (i.e., complex instruction set computing or “CISC” instructions). These techniques typically may only handle a limited granularity (mostly a straight line of instruction) and a simple pattern, such as either exact patterns or a limited degree-of-freedom. Additionally, certain techniques exist for finding duplicate code and detecting clones using high-level source code. Further, graph pattern matching has been used in database systems.
Other techniques exist that employ instruction selection algorithms that utilize tree pattern matching to adjust code to include low-cost instructions. In particular, bottom-up rewrite systems (or BURS) algorithms may be used to determine best instruction sets for input codes (e.g., applications, routines, etc.) by iteratively matching various subtrees within input trees related to the input codes in order to find best cost sets of instructions (i.e., combination of instructions that cover the entire trees and yet provide the lowest costs/highest benefits). Based on the pattern matching, new, improved instruction sets may be generated for execution on computing devices.
However, the known techniques may not be suitable when offloading portions of complex code using graph-based representations. In other words, existing technologies may not use compiler back-end solutions that match directed acyclic representations of code to identify best offloading for heterogeneous multicore or distributed systems.