The increased data rate for the fourth generation (4G) of mobile telecommunications technology requires increased digital signal processor (DSP) performance. Examples of operations required to be performed by a mobile telecommunication DSP include byte to any byte mapping, such as required for performing interleaving, de-interleaving and insertion of rank information, etc.
Conventionally, in order to implement any byte to any byte mapping a butterfly and reverse butterfly schematic is required. FIG. 1 illustrates an example of a butterfly and reverse butterfly schematic for performing any byte to any byte mapping. The implementation of the butterfly method is complicated due to each stage of the butterfly method being different. The implementation of a 16 byte butterfly can be in one of two options:                (i) four different instructions (one per stage) requiring 8 cycles to complete; or        (ii) one instruction to implement all stages (such an instruction requiring 32×2 control bits, which is not possible under common instruction rules, and using 8 multiplexers in a row, that may impact timing).        
Furthermore, the definition of the required control settings per the required bit mapping is very complex and typically requires the use of external programming assistance.
Traditional implementations (based on the first of the above options) suffer from poor performance for the increased data rate required for 4G technology due to the large number of separate pack, insert-extract and permute instructions that are required to be executed within traditional DSPs.
FIG. 2 illustrates an example of performing an LTE (Long Term Evolution) channel interleaving (Qm=6) procedure using conventional pack, insert-extract and permute instructions. An example of the corresponding instructions is provided below:                ld.2l (r0)+,d0:d1; load 8 bytes (2 longs) from A        ld.2l (r1)+,d8:d9; load 8 bytes from B        pack.w.2w d1.h,d8.h,d1; pack d1.h and d8.h to d1        pack.w.2w d8.l,d9.h,d2        st.2l d0:d1,(r2)+; store 8 bytes to dest0        st.l d2,(r2); store 4 bytes to dest0        ld.l (r0),d2; load 4 bytes from A        ld.l (r1),d10; load 4 bytes from B        pack.w.2w d1.l,d2.h,d8        pack.w.2w d2.l,d9.l,d9        st.2l d8:d9,(r3)+; store 8 bytes to dest1        st.l d10,(r3); store 4 bytes to dest1        
In this example interleaving procedure, two “insert and pack” operations are required to be performed. The procedure starts with two load instructions for loading data into source registers, as illustrated at 200 and 205. Two separate ‘pack’ instructions are then required; the first pack instruction, illustrated at 210, inserts a first data block into destination registers, whilst the second pack instruction, illustrated at 215 packs a plurality of further data blocks into the destination registers after the first data block. Two store instructions are then executed to store the content of the destination registers, as illustrated at 220 and 225. The second “insert and pack” operation is then performed, starting with two load instructions for loading data into source registers, as illustrated at 230 and 235. Two separate ‘pack’ instructions are then required; the first pack instruction, illustrated at 240, packs an initial plurality of data blocks into destination registers, whilst the second pack instruction, illustrated at 245 inserts a further data block into the destination registers after the initial plurality of data blocks. Two store instructions are then executed to store the content of the destination registers, as illustrated at 250 and 255.