Much of today's telecommunication networks consist of a series of interconnected packet networks. As a packet travels through these networks, it will be processed by numerous network elements. Various network elements will inspect, prioritize, discard, and/or forward the packet through the various networks on which the network elements exists. These operations are performed by processing units within the network elements, and often times the processing units are specially designed packet processing cores within the network element. Quite often, a packet processing core comprises some on-chip packet store (e.g. cache or RAM) and is coupled to some off-chip packet store (e.g. RAM or disk). As each packet arrives at a network element, the packet processing core responsible for each packet will transfer the packets from the on-chip packet store to the off-chip packet store until each packet is ready to be transmitted out of the network element, at which time the packet processing core will retrieve the packet from the off-chip packet store and transmit the packet out of the network element. Because each packet is transferring between on-chip and off-chip packet stores, the packet processing core must be coupled to an off-chip packet store over a data bus that is capable of carrying each and every packet at line rate so that congestion does not occur during network saturation. This means that as packet processing core speed increases and the bandwidth of the network element increases, so must the data bus bandwidth between the packet processing core and the off-chip packet store.