The present invention relates generally to reducing broadcast messages in a multi-processor system and, more particularly, to using the translation lookaside buffer having page table extensions to reduce such broadcasts.
Modern computers use a virtual addressing scheme which allows the computer to address an address space larger than its internal memory. Before memory can be accessed in such a scheme, however, each virtual address must be translated to a physical address. Unfortunately, the translation process ordinarily requires multiple accesses to page and segment tables in the computer's memory, which significantly degrades the computer's performance.
To overcome this problem, a translation lookaside buffer (TLB) is used to maintain the most recently used virtual address to physical address mappings. Each TLB entry ordinarily contains a virtual address, a physical address mapped to the virtual address, and control information such as validity flags. Typically, the TLB is searched to see if a physical address mapping for a given virtual address is present. If a physical address mapping is present in the TLB, the physical address may be obtained directly from the TLB, thus avoiding the long latency lookup in the page and segment tables.
A TLB is like a cache that memory management hardware uses to improve the latency of the virtual to physical address translation. A TLB has a fixed number of slots that contain page table entries, which map virtual addresses to physical addresses. It is typically a content-addressable memory (CAM), in which the search key is the virtual address and the search result is a physical address. If the requested address is present in the TLB, the CAM search yields a match quickly, after which the physical address can be used to access cache/memory. This is called a TLB hit. If the requested address is not in the TLB, the translation proceeds by looking up the page table in a process called a page walk. The page walk is a high latency process, as it involves reading the contents of multiple memory locations and using them to compute the physical address. After the physical address is determined, the virtual address to physical address mapping and the protection bits are stored in the TLB for future use.
To provide a consistent view of the contents of the memory to the multiple processors in the systems, it is required to maintain cache coherence in multiprocessors. Two classes of cache coherence protocols are well-known in the literature: directory-based protocols and bus-based snoopy protocols. Directory-based protocols require a given memory request to be streamlined through a global directory lookup, followed by multi-cast to the owners of the data to take necessary action (invalidate/update) as a result of the current request, thus making the directory access to be in the critical path, and hence a bottleneck. Snoopy protocols broadcast the current request to all the cache directories, and wait for their responses to determine the action (invalidate/update) as a result of the current request, thus making the number of snoop probes grow as the square of the number of processors in the system, resulting in substantial increase in traffic and power consumption as the systems grow.
In multiprocessor systems, snoop traffic generated to maintain cache coherence consumes a significant portion of on-chip and off-chip bandwidth and power. The present invention describes a method and an apparatus to reduce the broadcast messages generated by the snoopy cache coherence protocol in a typical multiprocessor system.