A prior art Very Long Instruction Word (VLIW) instruction is shown 100 in FIG. 1. The VLIW instruction 100 is located at memory address 101. The VLIW instruction consists of a number of operations: OP0, OP1, OP2, OP3, . . . OP(n−1). These n operations are multiple Reduced Instruction Set Computer (RISC) instructions which are to be run in parallel on a VLIW processor.
In FIG. 2, program counter register (PCR) 120 is used to select a memory address 0xNm from a prior art VLIW memory map 110. The notation 0x is used to indicate a hexadecimal number. Arrow 121 is pointing to the address 0xNm which is located at program memory address 101m. Memory address 0x0 is the lowest program memory address 101A. Program memory address 0x1 is a higher address than address 0x0, and 0xNn is the highest memory address. Each memory address has a corresponding row of operation instructions which represent a given VLIW instruction. For example, memory address 0x0 has operation instructions OP00, OP01, OP02, and OP03 which constitute VLIW instruction 100A. At address 0x1, operation instructions OP10, OP11, OP12, and OP13 constitute VLIW instruction 100B. At address 0xNm, operation instructions OPm0, OPm1, OPm2, and OPm3 constitute VLIW instruction 100m. At address 0xNn, operation instructions OPn0, OPn1, OPn2, and OPn3 constitute VLIW instruction 100n. 
The program counter 120 points 121 to the address of the VLIW instruction to be executed. As an illustrative example, each VLIW instruction 100A, 100B, . . . 100m, . . . , 100n consists of four operations which are to be executed in parallel.
FIG. 3 introduces a prior art very long instruction word (VLIW) processor 131 having execution circuitry 132 having multiple execution units for processing each operation instruction of a given VLIW instruction 100 in parallel. Each VLIW instruction or VLIW word, e.g., VLIW instructions 100A, 100B, 100m, etc., contains multiple reduced instruction set computer (RISC) operation instructions, e.g., operation instructions OP00, OP01, OP02, and OP03 of VLIW instruction 100A. In FIG. 3, the execution circuitry 132 is executing instructions OP0, OP1, OP2 and OP3 in parallel. The execution circuitry 132 is linked to register file 136. Program counter register 120 is contained in register file 136.
Program counter 120 contains a pointer to program memory address 0xNm. In FIG. 3, VLIW instruction fetch 135 fetches VLIW instruction 101m from program memory 111 pointed to (arrow 121) by program counter 120 (FIG. 2) in register file 136. The VLIW instruction is then processed by decoder 134. Next, the operation instructions for the VLIW instruction are executed in parallel by the execution circuitry 132. Program counter 120 is incremented to point to the next VLIW instruction. The process is then repeated for the next VLIW instruction which is fetched from program memory at the next selected memory address contained in program counter 120.
Linked list data structures have been one method of arranging or organizing data stored or mapped in computer memory. Some simple linked list structures of the prior art are the singly-linked list and the doubly-linked list. In FIG. 4, the singly-linked list in block 400 has a starter pointer 401 which points to node or element 402. Node 402 contains data L, and a pointer 403 to node 404. Node 404 contains data I and pointer 405. Pointer 405 points to node 406. Node 406 contains data S and pointer 407. Pointer 407 points to node 408. Node 408 contains data T and pointer 409. Pointer 409, called the NULL pointer, points to the end node (end of the linked list) called NULL 410. With reference to FIG. 5, the doubly-linked list 500 has a start forward pointer 501, and a start reverse pointer 514. The forward start pointer 501 allows the linked listed to be read in the forward direction and the reverse start pointer 514 allows the list to be read in reverse. Each node 503, 506, 509, and 512, contains a forward pointer, a reverse pointer, and data. The NULL pointers in 513 and 502 define the end of the linked lists.
In FIG. 6, a more complicated prior art linked list structure 600 has an arrangement of nodes where some nodes are filled with data such as L, I, N, K, E, D L, I, S, T, etc. and some nodes have empty cells such as node 603. The nodes are connected by pointers, such as pointers 601 and 602. Watermarks can be embedded in the lengths and positions of the pointers as illustrated in 604. Watermarking is a prior art software protection technique. Watermarks can be placed inside computer software using prior art software protection tools. Other software protection techniques include obfuscation, opaque predicates, transformations, and application performance degradation.
In the computer art, a pipeline microprocessor breaks the instruction down into a series of smaller processing steps. The pipeline is understood to be a stream of processing steps connected in series such that the output of stage n is connected to the input of stage (n+1). In prior art microprocessors, the pipeline architecture provided a way to speed up the number of instructions processed per second.
In FIG. 7, a simple prior art, three stage pipeline 700 is shown. The pipeline consists of three stages: instruction fetch 701, instruction decode 702, and instruction execution 703. The instruction fetch 701, reads in an instruction from main or program memory. The instruction decode stage in 702, “formats” the instruction for execution. In the instruction execute block 703, data processing occurs. As shown in FIG. 7, the stages 701, 702, and 703 are connected together in series. To complete execution of a machine language instruction, it must sequentially travel through the stages 701, 702, and 703. The pipeline 700 encounters a difficulty when a branch or jump instruction occurs.
A ‘branch instruction’ in a computer pipeline changes the flow of the instructions. However, before the flow can be changed, the pipeline must first be cleared out because the partially completed instruction contained in stages 701 and 702 is invalid. To clear out the pipeline, no-operations are placed in the pipeline stages 701 and 702. This clearing out process results in an inactive time period in the pipeline which is known as the branch penalty. The no-operations perform no useful computer processing and reduce the performance of the microprocessor.
The time/space diagram for the execution pipeline of FIG. 8 demonstrates what happens to a three stage pipeline such as that of FIG. 7 when a branch instruction occurs. For example at time=3, instruction fetch is reading at address 0xff02, the instruction, SUB, at 0xff01 is being decoded, and ADD from 0xff00 is being executed. When time=5, the branch instruction from 0xff02 is being executed. The problem here is the instruction branches to address 0x110a, not 0xff03. Looking at the pipeline, we see that the instruction associated with address 0xff03, has been read into the instruction fetch stage 801 at time=4, and decoded at decode stage 802 at time=5. However, as a result of the branching, the next address to be read is at 0x110a; thus, instruction fetch stage 801 and decode stage 802 contain the partially completed operations for the wrong instruction, e.g. 0xff03 instead of 0x110a. The penalty for processing the branch instruction (branch penalty) has reduced the amount of processing done.
In prior art microprocessors, branch prediction, branch target prediction and speculative execution are used to reduce the branch penalty in pipeline microprocessors; however, such techniques typically result in inefficiency. In prior art software protection techniques, the software protection techniques require processing time and further reduce the performance of prior art microprocessors, VLIW processors, etc.