1. Field of the Invention
Embodiments of the present invention generally relate to instruction execution for multi-threaded processing and, more specifically, to using a cache memory to store the top entries of a stack.
2. Description of the Related Art
Conventional multi-threaded processing systems use stacks to store data or subroutine return addresses in memory. Each stack is typically configured to store a large number of entries, e.g., hundreds or thousands of entries, and separate stacks are used for each processing thread. Therefore, the amount of memory needed to store the stacks on the same die as the multi-threaded processing units may increase the cost of producing the processing system. In order to reduce the cost, in some conventional systems the stack memory is not included on the same die as the multi-threaded processing units. In those systems, the latency incurred accessing the stacks may reduce processing performance of the multi-threaded processing units.
Accordingly, there is a desire to support large stack sizes for use during multi-threaded processing without reducing the processing performance due to the latency incurred while accessing the large stacks.