For a multiprocessor system it may be necessary to perform multiple atomic operations on a small data structure. Traditionally caches are used to reduce bandwidth and latency for modifying multiple fields. Many processors have operations that either perform atomic operations or can acquire locks for performing operations atomically. These are not suitable for high performance critical code sections. In general, the known prior approach is to devote a processor for doing this specific operation and have other processors send messages to it. If the data structure is copied into a local cache, the latency for transferring the data is visible to any other processors waiting for atomic access to that data structure. If the atomic operations are dispatched to the memory system individually, then the latency of sending each operation is visible to the processor and the bandwidth to the memory system is increased. Wanted is a method for reducing the cost of updating data structures. Note, nothing described or referenced in this document is admitted as prior art to this application unless explicitly so stated.