Multiprocessor systems use atomic read-modify-write operations to process shared data structures amongst multiple processors or threads. These may operate on cacheable memory, as well as on noncacheable memory. When the operation is to cacheable memory and does not cross a boundary, (e.g., a cache line), the processor may utilize internal means such as cache line locking to keep the operation atomic. When the bus lock is to non-cacheable memory, or crosses a boundary where the processor cannot use an internal means, it requires a way to perform an atomic read-modify-write.
A common solution to provide the necessary atomicity for noncacheable atomic transactions is to “lock” the interconnect fabric, (i.e., the wiring and the signaling protocols by which processors, caches, and memory communicate with each other), reserving sole use of it to the one processor and stalling all others. Conventionally, this has been done in the fabric by arbitrating for, and enforcing, the lock condition at each switch point in the topology of the fabric.
Additionally, processors use virtual-to-physical address translation schemes and commonly cache these operations in Translation Lookaside Buffers (TLBs). When software changes one of these translations, such as to invalidate a virtual address, change protections on a page, move a page and the like, all cached (TLB) copies of the translations have to be removed before the software can take the changed translation into effect.
One conventional solution used to synchronize changes to translations is to let software explicitly invalidate TLBs on multiple processors by interrupting all processors and running a task on each one to invalidate the TLB entry, or entries, that changed. The processor initiating the translation change interrupts every other processor. The receiving processors run an interrupt handler that flushes the changing translation from their TLBs. Another conventional method used to synchronize changes to translations is direct hardware communication from processor to processor, (e.g., the software uses explicit TLB invalidate instructions to send hardware messages to every other processor describing the translation that is changing). After one or more initiating processors sends a “synchronize” message to every other processor and receives a handshake response back when all prior TLB-invalidate messages have had their full effect at that processor. Dedicated hardware ensures that the synchronize operation will not finish until every processor has stopped using every translation that was invalidated before the synchronization operation began.
In the conventional bus lock solution, every intermediate switch point in the interconnect fabric must be aware of the lock and implement hardware for it. Each switch point arbitrates between competing lock requestors and each switch point enforces a granted lock by interdicting traffic from non-locked processors. Accordingly, larger systems require more complex interconnect topologies.
The conventional synchronization solution for TLB invalidation requires point to point communication. This solution may not scale up well because it requires wiring or transactions proportional to the square of the number of processors involved. Additionally, it may result in lower performance through serialization of invalidate/sync sequences issued by multiple processors at the same time.
It would therefore be beneficial to provide a method and apparatus for performing a bus lock and/or a TLB invalidation that is not subject to the limitations of the conventional solutions.