Processors have become very complicated machines including numerous structures and complex control techniques to enable instructions and data to traverse the machine to perform different requested operations. To improve processor performance, some processors exploit instruction level parallelism (ILP). While such techniques may increase performance, they may also increase power consumption and design complexity. Accordingly, some processors are being designed to enable multiple cooperative threads via architectures that support and exploit thread level parallelism (TLP). Such processors may include multiple cores, often many small cores such as small in-order simultaneous multithreading (SMT) cores.
However, such in-order cores may be less effective than out-of-order cores in exploitation of ILP. That is, while in-order processors may efficiently manage parallel applications, single-threaded applications and serial code portions of parallel applications may not efficiently perform on such architectures. Accordingly, certain processors may break apart such applications to execute fine-grained threads to maintain minimal complexity while improving efficiency. However, excessive overheads can occur when a first thread seeks to use information of a second thread.