1. Field of the Invention
The invention relates generally to a memory management scheme and, more particularly, to using a cache memory to transfer data via an on-chip internal bus.
2. Description of the Related Art
In a large configuration computer system, applications data are transferred from a system memory to processors, and then computed data will be transferred back and forth to the system memory before the same set of computed data can be reused by other processors. The time consumed by transferring data back and forth to the system memory becomes a big issue for system performance. If the system design is not well tuned, the processor will spend most of the time waiting for data availability.
In a large system configuration, there is a hierarchy of different memories, such as a level one (L1) cache, a level two (L2) cache, a level three (L3) cache, and a system memory. An L1 cache is closest to the processor and usually not shared with other processors in a multi-processor system. Typically, an L1 cache resides within a processor, whereas an L2 cache resides outside a processor. Two or more processors may share an L2 cache; however, an L2 cache is usually coupled to a different processor. An L3 cache is further away from the processor than an L2 cache and is closer to the processor than the system memory. These caches will keep data close to the processors, and the data will be reused with a much better latency.
In a multi-processor system, however, a cache may contain data when a processor not directly coupled to the cache requests the data. For example, a first processor may request data that is stored in an L2 cache coupled to a second processor but not directly coupled to the first processor. In this example, the requested data in the L2 cache cannot be transmitted to the first processor directly. The requested data first has to be transmitted to a system memory (or an L3 cache) and then to the first processor. This definitely affects the performance of the multi-processor system, because the first processor has to wait for the requested data to be transferred first from the cache to the system memory (or the L3 cache) and then from the system memory to the first processor.
Therefore, a need exists for a system and method for improving performance of a computer system by directly transferring data from a cache to whichever processor requests the data.
The present invention provides a system and method for improving performance of a computer system by providing a direct data transfer between different processors. The system includes a first and second processor. The first processor is in need of data. The system also includes a directory in communication with the first processor. The directory receives a data request for the data and contains information as to where the data is stored. A cache is coupled to the second processor. An internal bus is coupled between the first processor and the cache to transfer the data from the cache to the first processor when the data is found to be stored in the cache.