Various techniques have been proposed for dynamically parallelizing software code at run-time. For example, Akkary and Driscoll describe a processor architecture that enables dynamic multithreading execution of a single program, in “A Dynamic Multithreading Processor,” Proceedings of the 31st Annual International Symposium on Microarchitectures, December, 1998, which is incorporated herein by reference.
Marcuellu et al., describe a processor microarchitecture that simultaneously executes multiple threads of control obtained from a single program by means of control speculation techniques that do not require compiler or user support, in “Speculative Multithreaded Processors,” Proceedings of the 12th International Conference on Supercomputing, 1998, which is incorporated herein by reference.
Marcuello and Gonzales present a microarchitecture that spawns speculative threads from a single-thread application at run-time, in “Clustered Speculative Multithreaded Processors,” Proceedings of the 13th International Conference on Supercomputing, 1999, which is incorporated herein by reference.
In “A Quantitative Assessment of Thread-Level Speculation Techniques,” Proceedings of the 14th International Parallel and Distributed Processing Symposium, 2000, which is incorporated herein by reference, Marcuello and Gonzales analyze the benefits of different thread speculation techniques and the impact of value prediction, branch prediction, thread initialization overhead and connectivity among thread units.
Ortiz-Arroyo and Lee describe a multithreading architecture called Dynamic Simultaneous Multithreading (DSMT) that executes multiple threads from a single program on a simultaneous multithreading processor core, in “Dynamic Simultaneous Multithreaded Architecture,” Proceedings of the 16th International Conference on Parallel and Distributed Computing Systems (PDCS'03), 2003, which is incorporated herein by reference.
U.S. Patent Application Publication 2014/0282601, whose disclosure is incorporated herein by reference, describes a method for dependency broadcasting through a block-organized source-view data structure. The method includes receiving an incoming instruction sequence using a global front end, and grouping the instructions to form instruction blocks. A plurality of register templates is used to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions. A block-organized source-view data structure is populated, wherein the source-view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates. Upon dispatch of one block of the instruction blocks, a number belonging to the one block is broadcast to a column of the source-view data structure that relates to that block, and the column is marked accordingly. The dependency information of remaining instruction blocks is updated in accordance with the broadcast.