Modern processors such as central processing units (CPUs) support a number of execution units (threads, cores, sockets, etc.). Experimentation with various benchmarks shows that not all execution units have the same efficiency for a given set of tasks. For example, there is a cost for accessing memory across sockets. Additionally, multiple tasks assigned to threads of the same core may be inefficient if other cores are available. By scheduling available execution units onto threads randomly or otherwise in a “dumb” manner, performance is not maximized.