As the availability of multi-core and other multiple processor computing devices has continued to rise, and the cost thereof has continued to fall, multi-thread processing has become an increasingly effective approach to optimizing the speed and efficiency with which computing devices are able to process data, especially for applications such as image processing and voice encoding, wherein a specified algorithm is applied repeatedly to a large number of data frames or “batches” in a fixed timing order, with little or no interdependence between the processing of the separate data frames.
In many cases, the functions performed by a given application can be divided into “host” or “control” functions that have an unknown timing order and may have interactive inputs, and one or more “helper” functions that do not have any interactive inputs, and for which the input order is strictly defined. An example would be a computer game, where some functions have an unknown timing order and may have interactive inputs, such as functions that are directly associated with user interaction, while other functions are directed to performing certain background tasks associated with the game, such as image processing or voice encoding, that accept and process data frames in a defined order without user interaction. Another example would be a voice-enabled application running on a smart appliance or a mobile device such as a cellular telephone.
In such cases, multi-thread processing can be implemented by assigning the function(s) that perform operations with an unknown timing, including those that support user interaction, to a “host” or “control” thread, while at least some of the helper functions are assigned to one or more “helper” threads. Often, when coprocessors are available, the control functions are executed on a host processor, while some or all of the helper threads are offloaded to one or more coprocessors.
While this multi-thread, host/coprocessor approach can be very powerful, successful implementation requires a robust and efficient method of synchronizing and coordinating the program executions performed by the host processor and the coprocessors.
One approach is to use hardware interrupts to synchronize the actions of the host processor and coprocessors. However, this approach is “costly” in terms of hardware utilization, and may be limited if the hardware platform does not provide sufficient interrupts with suitable functionality.
Another approach is to use a “message passing interface” (“MPI”) protocol implemented in shared memory. However, this approach is costly in terms of execution time.
What is needed, therefore, is an efficient method of synchronizing and coordinating the program executions of a host processor and one or more coprocessors without dependence on interrupts, and while avoiding the execution speed penalty of an MPI implementation.