In recent years, computer systems' performance and capabilities have continued to advance rapidly in light of various technological advances and improvements with respect to processor architecture and execution of instructions. In particular, reduced-instruction-set-computers (RISC) have continued to improve significantly and become more popular for various applications. To minimize hardware size and increase clock speed, RISC typically includes a set of simple instructions and control flows. When targeting a specific application, a RISC instruction set can be augmented by instructions that accelerate and/or enhance the functionality needed for the application. These instructions typically improve the overall system performance by reducing the number of cycles needed for operations commonly used in the target application, while attempting to preserve the clock speed.
Packet processing for voice applications typically requires the conversion of packets across different protocol formats. For example, one common application is the conversion and/or transport of payloads from a protocol format that supports variable-size packets (e.g., Internet Protocol (IP) format) to one that supports fixed-size cells (e.g., Asynchronous Transfer Mode (ATM) format). When transmitting IP packets over an ATM network, the IP packets need to be segmented into fixed size blocks that are placed in ATM cells. At the receiving end, the fixed size blocks need to be extracted from the ATM cells and reassembled into the IP packet. Similar segmentation and reassembly (SAR) operations need to be performed when a packet is required to be converted from IP format to ATM format.
One approach in implementing software SARs is to copy data from one memory location to another memory location. The memory copy process is implemented as the body of a loop construct (also called loop or looping instruction herein). When implemented with a traditional RISC instruction set, there are two constraints with respect to a memory copy based SAR:
1. In the memory copy operation, at least one memory address does not change linearly:                a. In packet reassembly, the source (cell) addresses change non-linearly while the destination (packet) addresses change linearly;        b. In packet segmentation, the source (packet) addresses change linearly while the destination (cell) addresses change non-linearly.        
2. Within the body of the loop construct, on each iteration, the number of data units (bytes) that can be copied varies according to the space left in the fixed cell. For example, assuming that the body of the loop construct is capable of copying up to 32 contiguous bytes from a source to a destination on each iteration. On iteration 1, a total of 32 bytes (starting at the first address of the first cell) can be copied from the first cell to the destination packet. However, on iteration 2, only the 16 bytes remaining (assuming that the cell has 48 bytes total) in the first cell, starting at address 32 in the first cell, can be copied to the destination packet.
Verifying the constraints mentioned above in software requires a large number of arithmetic/logic operations per iteration of the loop construct. Thus, the overhead for performing SAR operations can be substantial.