Network processors (NP) may be used for packet processing. However, the latency for one external memory access in network processors may be larger than the worst-case service time. Therefore, network processors may have a parallel multiprocessor architecture, and perform asynchronous (non-blocking) memory access operations, so that the latency of memory accesses can be overlapped with computation work in other threads. For instance, an example of network processors may process packets in its Microengine cluster, which consists of multiple Microengines (programmable processors with packet processing capability) running in parallel. Every memory access instruction may be non-blocking and associated with an event signal. That is, in response to a memory access instruction, other instructions following the memory access instruction may continue to run during the memory access. The other instructions may be blocked by a wait instruction for the associated event signal. When the associated event signal is asserted, the wait instruction may clear the event signal and return to execution. Consequently, all the instructions between the memory access instruction and the wait instruction may be overlapped with the latency of the memory access.