The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for arranging binary code based on call graph partitioning to reduce instruction cache conflict misses.
Many modern computing devices utilize a multiprocessor architecture in which multiple processors are provided to increase the computation power of the computing device. One example of a modern multiprocessor architecture is the Cell Broadband Engine (CBE) available from International Business Machines Corporation or Armonk, N.Y. With the CBE, a primary control processor, referred to as the PPE, is provided along with a plurality of controlled processors, referred to as synergistic processing elements (SPEs). Each SPE has a local memory, or local store, into which instructions and data are copied so that the SPE may execute instructions in the local store on data brought into the location store from main memory. Thus, the local store serves as both an instruction and data cache for the SPE. Other multiprocessor architectures utilize similar configurations in which the processors may have a local instruction cache and data cache into which data and instructions are brought before executing on the processor or having the processor operate on the data.
Typically, the local store, or cache, of a modern day multiprocessor architecture is designed to be much smaller in storage size than the main memory. Thus, executing code larger than the processor's local store or cache size requires a strategy for swapping pieces of code, or code segments, into the local store or cache before use. In some cases, a code segment may include branch instructions whose target instruction is located in a different code segment that may not be currently present in the local store or cache. This would require a strategy for bringing in the code segment corresponding to the target instruction from main memory.