Many software/firmware-based systems support multiple concurrent threads of execution. Doing so has a potential to increase concurrency and hence throughput. This approach has to be used with care when relative ordering of work done by different threads is important. The risk of race conditions is a well-known problem, and synchronization methods are needed to prevent undesired relative ordering of events. While simple synchronization schemes are relatively straightforward, they constrain parallelism.
One way to get around a limitation of using a single lock, and hence completely disabling parallelism, is to adopt pipelining. Pipelining is most commonly done in hardware, where each subpart of a piece of hardware does a different thing, and data to be processed is moved from one pipeline stage to another, much like on an assembly line. A software system can mimic this by have a software thread implement the function of each pipeline stage run on its own CPU. Work passing from one pipeline stage to another is passed from one software thread to another, and possibly through queues to accommodate potentially different rates of processing in each pipeline stage. This is sometimes done in embedded systems, employing firmware running on a multiple processor design. However, this approach has a number of limitations. First, when functionality is implemented by software and there are more pipeline stages than the number of processors (i.e. the number of threads that can be executing), the model breaks down and has to be modified by merging pipeline stages. In addition, when different pipeline stages take different amounts of time, processing efficiency suffers. Some processors will be underutilized. Further, unbalanced pipeline stages degrading performance is a problem for pipeline hardware. In general, hardware designers try hard to make sure each pipeline stage takes a similar amount of time. However, this is only possible because each hardware pipeline stage is typically very simple, and has little dynamic variability. Software-implemented functions generally have high dynamic variability. Furthermore, when code performing different tasks has to time-multiplex and share a more limited number of processors, it becomes impractical, if not impossible, to balance pipeline stages. At the very least, it becomes a complex scheduling problem. Another disadvantage of such a prior art solution—software implementing a strict pipeline model—is that handing work between software threads running on different processors as the work proceeds through the pipeline is inefficient. It engenders coordination and synchronization overhead, and quite possibly a fair amount of state transfer between CPUs running the threads.