A high-speed network flow processor integrated circuit, for speed and throughput and cost reasons, generally does not buffer all packets that pass through the integrated circuit. Rather, the payloads of at least some packets are buffered temporarily in external memory, and when the packet is to be output from the integrated circuit then the packet is assembled on the integrated circuit, and is then output from the integrated circuit. Methods and structures are sought for improving this merging and packet outputting process.