Memory latency has become the critical bottleneck to achieving high performance on modern processors. Many large applications today are memory intensive, because their memory access patterns are difficult to predict and their working sets are becoming quite large. With the advent of multithreading technology such as Simultaneous Multi-Threading (SMT) architecture feature available in a processor, such as Intel Pentium® 4 processor with Hyper-Threading technology or Chip-multiprocessor (CMP), to leverage the emerging multithreading techniques, a set of new techniques has been introduced, including new compiler transformation for generating efficient helper thread code to parallelize single-threaded applications in a way to run on multithreading machine, such as a machine having SMT architectures based on helper thread technology for speculative multithreading that are geared towards adaptive data prefetching. In a typical system, a thread switch has to save and restore a fixed amount of registers, which may waste register resources.