Parallel programming is a technique for computing devices to split computations into small chunks of work (referred to as tasks) in order to provide responsive and high performance software. In a multi-core or multi-processor computing device (e.g., a heterogeneous system-on-chip (SOC)), different tasks may be assigned to (or offloaded to) various processing units of the device, with some tasks being specified to run after others finish due to task dependencies. Typically, a runtime engine (or task scheduler) determines to which processing unit a task may be assigned, and such determinations may typically be based on various device, processing unit, and/or task characteristics or conditions.
Some tasks may be directed to or designed for particular processing units. For example, a first task may be designed for execution by a central processing unit (CPU), a second task may be designed for execution on a graphics processing unit (GPU), and a third task may be designed for execution on a digital signal processor (DSP). Tasks meant for different processing units are often written in different programming languages or using different specifications. For example, the code to implement a vector addition calculation as a CPU task and the code to implement a matrix multiplication calculation as a GPU task may use different languages and/or syntax. To capitalize upon the different processing units in a computing device, different versions of common general-purpose tasks may be concurrently supported. A “multi-versioned” task may be associated with or otherwise include multiple implementations of the same logical function or routine, with each implementation specialized for execution by a particular processing unit. For example, a vector addition calculation may be implemented as a CPU task and a GPU task that both use different languages and/or syntax.