Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Efficient management of interacting threads and workers needs fine granularity and awareness of process state and newer processors offer hardware acceleration for thread management. When interacting threads share hardware, various optimizations may be applied to hand off demand between the two. These improvements are typically available if smooth and fast worker handoff can be established between the interacting threads but they allow improvements like pipeline interleaving and fast task switching on successful speculative execution.
Meanwhile, datacenter multi-worker architectures are currently highly network-centric, designed around multiple minimum size worker/service instances that intercommunicate via messaging or queues so that multiple source workers can assign tasks onto a queue while multiple workers pull tasks off the queue to turn them into the next stage product output. Thus, existing multicore software can use multiple cores well. Existing web services, however, are designed to communicate via messaging capacities like message queues so even if two web services are on neighboring cores they send data between each other via the network—via a trip through at least a virtual router at the Virtual Machine Manager (VMM) level. This approach may be about 4-6 orders of magnitude slower than using intercore hardware.