1. Field of the Invention
Embodiments of the present invention generally relate to computer processors.
2. Description of the Related Art
Computers typically include a main memory for storing programs and a processor for executing the programs stored in the main memory. In some cases, a processor may include multiple processing cores which may be used to simultaneously process data from multiple threads of execution (e.g., from multiple programs, from multiple processes, and/or from multiple threads). Each processing core may itself be used to process multiple threads of execution, for example, by processing the threads simultaneously (simultaneous multithreading) or by processing each thread for a short amount of time (e.g., as determined by a priority) before processing a subsequent thread as known to those skilled in the art.
There is generally a desire to have as many processing cores as possible each concurrently processing as many threads as possible in order to obtain the greatest processing power and efficiency from the processor. For example, a plurality of threads may be used to execute an application such as a video game which performs three-dimensional graphics rendering, sound effects, physics simulations, player input/output, and other functions. To provide the most realistic experience to the video game player, there may be a desire to have each thread perform a given function (e.g., one thread may draw a three-dimensional scene, also referred to as rendering while another thread performs a physics calculation) requiring a certain amount of processing power for a set amount of time. For example, if the processor is being used to render a three-dimensional sequence of an action being performed by a video game player in a video game, there may be a desire to render each picture (referred to as a frame) in the sequence quickly such that the action appears to occur smoothly (e.g., if the action being performed by a video game player is a jump, there may be a desire for the jump to appear smoothly up and down as would a real-life jump).
In order maintain simultaneously executing threads of execution, the processor may be configured to efficiently retrieve data and/or instructions for each executing thread from the computer's main memory. In some cases, the retrieved data and instructions may be placed in one or more small memories referred to as caches which may be located on the same chip as the processor. The caches may also be arranged hierarchically, for example, such that a first cache (referred to as an level two cache, or L2 cache) is shared by each processing core in a processor while multiple smaller caches (referred to as level one, or L1 caches) are provided for a given processing core or group of processing cores. Where data and instructions requested by a thread are not available in one of the processor caches, the processor may request the data and instructions from the main memory.
While the requested data and instructions are retrieved from main memory, execution of the thread requesting the data and instructions may be temporarily paused by the processing core to provide time for the request to be fulfilled. In some cases, other threads may be executed while the thread requesting data and instructions is paused. However, if too many threads are paused waiting for data and instructions, one or more processing cores in the processor may remain idle while the data and instructions are retrieved from the main memory.
Where a processor provides multiple cores executing multiple threads, each thread may also be in competition with other threads for use of the processor's cache space. For example, because the cache space in the processor may be smaller than the computer's main memory, the cache space may not be large enough to hold all of the data and instructions for each thread being executed by each of the processing cores. Thus, when a given processing core switches from executing a first thread to executing a second thread, the data and instructions for the first thread may be removed from the cache and replaced with data and instructions for the second thread. If execution of the first thread is subsequently resumed, the first thread may again be paused while data and instructions for the first thread are retrieved from the main memory and placed back in the processor's caches. Pausing threads of execution while data and instructions are retrieved from the main memory may decrease efficiency of the processor.
Where multiple threads in the processor are accessing data and instructions from the main memory, the amount of data being transferred to and from the main memory (referred to as the consumed memory bandwidth) may increase significantly as each thread sends data to and from the main memory. When the consumed memory bandwidth is increased, each subsequent access by a thread may be performed slowly (e.g., slowly relative to individual accesses when the main memory is not being accessed by multiple threads, for example, when the consumed memory bandwidth is low) as other accesses are performed. As described above, threads waiting for a memory access to be performed may be paused, thereby decreasing efficiency of the processor.
Accessing of data and instructions by threads of execution may be further complicated where a given thread of execution attempts to access data and/or instructions of another thread of execution being executed by the processor. Where threads of execution share data and instructions with each other, it may be difficult to efficiently share the data and instructions in memory without removing other data and instructions in the processor's typically limited cache space. As described above, as data and instructions are removed from the processor's cache space, and as other data and instructions are retrieved from main memory, the consumed memory bandwidth as well as the access time may be increased, thereby decreasing efficiency of the processor.
Where threads of execution being executed by the processor pause too frequently, performance of applications being executed by the processor may also suffer. For example, as described above with respect to a video game which renders a three-dimensional sequence of an action being performed by the video game player, there may be a desire to have the action appear smoothly and without any pauses. However, where a thread performing the rendering pauses due to slow memory access caused, for example, to one of the situations described above, the sequence being rendered may suffer from pauses which result in an unsmooth action sequence.
Accordingly, what are needed are improved methods and apparatuses for managing memory access in a processor. What are also needed are improved methods and apparatuses for rendering three-dimensional scenes with the processor.