1. Field of the Invention
The present invention relates generally to the field of distributed computing and more specifically to a system and method for maintaining cache coherency across a serial interface bus, such as Peripheral Component Interconnect Express (PCI Express or PCIe).
2. Description of the Related Art
In conventional computing systems, a processor reads data from an external memory unit and stores a copy of the data in a low-latency cache memory unit (cache) for later use. The processor may then read the copy of the data from the cache instead of reading the data from the external memory unit when executing operations using the data. Since data accesses between the processor and the external memory unit have a higher latency than data accesses between the processor and the cache, retrieving data from the cache allows the processor to execute instructions more quickly, and, ultimately, increases the performance of the processor. Caches are usually implemented as static random-access memory (SRAM) or another type of low-latency memory unit.
A typical cache is organized into a plurality of lines in which data is stored. Each line is marked with a tag that describes the validity of the cached data stored in that line. For example, when a processor copies data from the external memory to a particular cache line, that cache line is marked as “valid,” since the cached copy of the data is identical to the original data stored in the external memory unit. Alternatively, when the processor reads data from the cache line, modifies the data, and writes the modified data back to the cache line, then the cache line may be marked as “modified,” because the cached copy of the data is modified relative to the original data stored in the external memory unit. The modified data may be written back to the external memory unit so that the data stored in the external memory unit is identical to the cached data. When the data stored in the external memory unit is identical to the corresponding data stored in the cache, the cache is considered to be “coherent.” A cache is also coherent if the data differs from that stored in the main memory, as long as the cached data is marked as “modified.” Cached data would be “incoherent” if the cache stored different data than that stored in main memory, but the data was not marked as modified or if different caches had the same data marked as modified.
Caches may be implemented in distributed computing systems that include multiple processors interconnected by a PCI Express (PCIe) bus. Each processor may read data from one or more external memory units and store a copy of the data in a cache associated with the processor. The processor may then use the cached copy of the data to perform processing operations. If the processor modifies the cached copy of the data, then the processor may write the modified data back to the external memory unit in order to maintain cache coherency with the external memory unit, as described above. However, multiple central processing units (CPUs) in conventional systems cannot read and cache data across PCIe because there currently does not exist a way to maintain cache coherency across PCIe.
Each processor may also receive data through the PCIe bus. When a processor receives data via the PCIe bus, that data is typically marked as either “uncacheable” (UC) or “write-combining” (WC). Data marked uncacheable cannot be stored in a cache because the state of the computing system may be dependent on this data. In such a case, referencing this data may have effects that are expected by the computing system and required for normal operations. Accordingly, the data must be accessible to the computing system so that modifications to the data are known to the computing system. Thus, this data cannot be stored in an intervening cache without risking unpredictable operation of the computing system. Data marked WC is received into a buffer as data fragments. Those fragments are then combined to reproduce the data. The combined data is then delivered as one large write instead of multiple smaller writes. Data marked WC also cannot be cached because reads to a location marked as WC are treated in the same way as reads to locations marked UC.
One problem with the aforementioned configuration is that when a processor reads data that is marked UC or WC, that data cannot be cached by the processor. For example, if the processor executes multiple processing operations using the same data that was marked UC or WC, then the processor would be required to read the data from an external (non-cache) memory unit multiple times, thus introducing significant latencies. Another problem is that when a processor executes instructions using data that is marked UC, the processor serializes the execution of those instructions, which reduces the efficiency of the processor. Importantly, data that is received over a PCIe bus (e.g., data copied from the cache of another processor in a multiprocessor system) is marked as UC or WC. This data, therefore, cannot be cached, which introduces additional latencies. One solution to this problem is to avoid connecting processors with a PCIe bus. However, this solution greatly limits the possible configurations of the computing system.
Accordingly, there remains a need in the art for an improved caching technique across a bus such as a PCIe bus.