1. Field of the Invention
The invention relates generally to computing systems, and more particularly to systems and methods for improving the efficiency of multiprocessor systems by enabling a first processor to direct data to be loaded into the cache memory of a different processor and thereafter providing the data directly from the second processor to the first processor through intervention.
2. Related Art
As the complexity of data processing applications increases, there is a need for increased processing power. This need for increased processing power drives the development of new technologies and new techniques for using existing technologies.
The need for increased processing power can be addressed in a number of ways. For example, it may be possible to increase the clock speed of a processor and the corresponding number of instructions that can be executed by the processor in a given amount of time. Another way to improve the performance of a processor is to improve the performance of other systems that interact with the processor. For instance, the speed with which data can be accessed in a memory system can be increased (e.g., by caching data,) thereby reducing the amount of time spent by a processor waiting for data to be accessed and increasing the throughput that can be achieved by the processor.
Another way to increase the processing power of a computing system is to use multiple processors rather than providing a single processor to execute an application. While the performance of each individual processor may not be improved, there are more processors (hence more processing power) available to execute the application. It may be convenient to use multiple processors to execute applications such as multimedia applications because of the many different types of tasks that may need to be performed and the ability to configure the different processors so that they are optimized to perform these different tasks.
Some of these techniques can be combined with others, or can be used in different ways to further improve the performance of a computing system. In one system, multiple processors can be provided. Each of the processors in this multi-processor system may have a cache memory that is configured to store data that has recently been used by the processor. Because of the likelihood that recently used data will be used again by the processor in the near future, storing this data in the cache memory makes the data more readily available to the processor. That is, the data can be retrieved more quickly from the cache memory than from a main memory. The latency of the data (the time required to retrieve the data) is thereby reduced. In some multi-processor systems, each processor can also retrieve data from the caches of the other processors in the system, which can also reduce the latency of the data.
Another technique that can be used to improve the performance of the processors in the multi-processor system described above is to enable prefetching of data that will be needed by the processors. If it can be determined that particular data will be needed by a processor prior to execution of the instruction that actually uses the data, the data can be retrieved prior to execution of the instruction. The retrieved data can be stored in the processor's cache memory so that it is available for quick access by the processor.
While several techniques to improve the performance of the multi-processor system are described above, there are some limitations on the improvement that is possible. One shortcoming in particular is the fact that the technique of prefetching data is limited to a processor's own cache memory. In other words, when a particular processor prefetches data, the data can only be stored in the cache owned by the processor. While some conventional multi-processor systems enable their processors to use data stored in other processors' cache memories, these systems do not allow the processors to store data in the other processors' cache memories. This limits the usefulness of the ability to use the data in other processors' cache memories. This also limits the usefulness of caching in the first processor when, for example, a general load instruction causes data to be cast out of the processor's cache memory (i.e., the cast out data cannot be stored by the processor in another processor's cache memory.)
It would therefore be desirable to provide systems and methods for enabling processors in a multiprocessor system to load data into the cache memories of other processors.