FIG. 1 is a schematic block diagram illustrating a known processor architecture. The processor may employ a dual path architecture whereby the processor comprises two different separate hardware execution paths: a control execution path in the form of a control unit 2 which is dedicated to processing control code, and a data execution path in the form of a data processing unit 4 which is dedicated to processing data. The distinction between data and control processing will be discussed shortly. An exemplary instance of such an architecture is described for example in international patent application publication number WO 2005/096141.
In the illustrated processor, the control unit 2 is coupled to an instruction memory 6, which may be integrated onto the same chip as the processor or alternatively connected off-chip. Each of the control unit 2 and the data processing unit 4 is coupled to a data memory 8, which again may be integrated onto the same chip as the processor or alternatively be connected off-chip. The control unit 2 comprises a set of control registers 10, an instruction decoder 12, and an address formation unit 14. An exemplary control unit will also have control processing logic, not shown, e.g. for performing branch operations, and may also have scalar data processing logic. The data processing unit 4 comprises a set of data registers 16 and data processing logic 18. A set of registers is sometimes called register file. The instruction decoder 12 is coupled to the data registers 16 and to the data processing logic 18, as well as being coupled to the instruction memory 6 via fetch circuitry (not shown). The instruction decoder 12 is further coupled to the internal logic of the control unit 2, including being coupled to the control registers 10 and address formation unit 14. Each of the address formation unit 14 and the set of control registers 10 is also coupled to the data memory 8.
In operation, the fetch circuitry (not shown) fetches a sequence of instructions from the instruction memory 6 into the instruction decoder 12. The instruction decoder 12 decodes each instruction in the sequence and, depending on the decoded opcode contained within the instruction, determines which unit is required to execute the instruction. The processor thus executes a mix of three types of instruction, as follows:                (i) control instructions such as branches, which are executed by the control unit 2;        (ii) data processing instructions, which are decoded by the control unit 2 but then passed (as register addresses and opcodes) to the data processing unit 4 for execution; and        (iii) memory access instructions (loads and stores), for which the control unit 2 computes memory addresses of the data memory 8, and the corresponding memory data is then transferred to or from either the control registers 10 or the data registers 16.        
The term “control” as used herein refers to program flow control, including branching and address generation, and some logic and arithmetic for that purpose. In contrast, the phrase “data processing” or similar as used herein refers to other arithmetic and logical operations to be performed on data that is the subject of the program, i.e. data representing something other than the control flow of the program itself. Data processing in this sense does not include flow control (but may generate results which are used to make control flow decisions, e.g. branch conditions). For example, in the case of a software modem for wireless communication, the data may represent signals received or to be transmitted over an air interface, and the data processing operations may comprise signal processing operations. The results of such signal processing may be supplied to the control unit to allow it to make control flow decisions (e.g. as to what further signal processing is necessary), but the control flow itself (including the sequence of program branches and memory addresses) is effected by the control unit. As another example, the data could represent information from a peripheral device, or information to be output to manipulate a peripheral device. Typically the distinction between the control and data paths is manifested in that control unit 2 uses only scalar processing whereas the data processing unit 4 is configured to use vector processing (see below). In some applications some data processing in fact may be executed on the control path, although control flow code would not be executed on the data processing path.
Memory access instructions such as loads and stores may be considered a third type of instruction, in addition to control instructions and data processing instructions, which can act on the control unit 2 or both the control unit 2 and data processing unit 4.
As illustrated schematically in FIGS. 2a and 2b, each instruction comprises an opcode 28 and one or more associated operands 30 (or in the case of a few kinds of instruction it is possible that no operand is required). The opcode is a sequence of bits which when decoded by the instruction decoder 12 indicates the kind of operation to be performed. The one or more associated operands are a sequence of bits which when decoded by the instruction decoder 12 indicates the data to be operated on by that operation, usually by specifying a register and/or memory location where the subject data is currently being held, and depending on the kind of instruction a register and/or memory location for storing the result of the operation.
Data is loaded from the data memory 8 into the control registers 10 or data registers 16 by means of one or more load instructions (a type of memory access instruction). A load instruction 24 is illustrated schematically in FIG. 2a. It comprises an opcode 28 that when decoded indicates a load operation, and two operand fields 30. The first operand field comprises one or more destination operands and the second operand field comprises one or more source operands. For example, the source memory location is more usually indicated by two register addresses, the registers providing a base and an offset which when added together point to the memory location—this is the purpose of the address formation unit 14. Sometimes the offset is an immediate value instead of a register address. The source operand field specifies a memory location from which to take data, and the destination operand field specifies a register into which to place that data.
When loading to the control registers 10, load instructions act on only the control unit 2. The address formation unit 14 computes the relevant memory address from the source operand(s) and causes the data from that address within the memory 8 to be loaded into one of the control registers 10 specified by the destination operand. When loading to the data registers 16, load instructions act on both the control unit 2 and data processing unit 4. The address formation unit 14 computes the relevant memory address from the source operand(s) and causes data from that address within the memory 8 to be loaded into one of the data registers 16 specified by the destination operand.
As a simple example, consider load two instructions:    Load $r1, A1    Load $r2, A2
The first of these load instructions has one destination operand $r1, and one source operand field A1 (typically specified by $base+$offset). When executed it loads a word of data from memory address location A1 into register $r1. The second of these load instructions has one destination operand $r2 and one destination operand A2. When executed it loads a word of data from memory address location A2 into register $r2.
Once data is loaded into registers 10 or 16, then operations can be performed using the contents of those registers. If the instruction decoder 12 encounters a control instruction then it retains the instruction on the control path by executing it internally within the control unit 2 using the control unit's own internal logic and values in the control registers 10. If on the other hand the instruction decoder 12 encounters a data processing instruction, it diverts the instruction onto the data processing path by supplying the decoded opcode to the data processing logic 18 of the data processing unit 4 and supplying the decoded operand or operands in the form of one or more operand register addresses to the set of data registers 16. Alternatively one or more of the operand(s) 30 may be immediate (literal) values. A data processing instruction 26 is illustrated schematically in FIG. 2b. 
Referring to the example above, supposing $r1 and $r2 are data registers in the data register set 16, then data processing instructions can operate on them. For illustrative purposes, some simple examples would be:    Not $d1, $r1    Add $d2, $r1, $r2
The first of these data processing instructions has one source operand $r1 and one destination operand $d1. When executed it takes the bitwise complement of the value in register $r1 and places the result in a destination register $d1 of the data register set 16. The second of these data processing instructions has two source operands $r1 and $r2, and one source operand $d2. When executed it adds the contents of registers $r1 and $r2 and places the result in a destination register $d2 of the data register set 16.
The result of a data processing operation can be stored from the destination within data register set 16 into the data memory 8 by means of store instructions, and/or operated on again by means of further data processing instructions. Ultimately the results of such data processing will be output from registers 16 and/or data memory 8 to an external device, e.g. to output a decoded audio or visual signal to a speaker or screen in cases such as the processing of incoming signals received over a wireless communication system, or to transmit an encoded signal for transmission over a wireless communication system, or to manipulate a radio-frequency (RF) front end for transmitting such wireless signals.
The control and data paths may have instruction set architectures with asymmetric instruction widths, and may have asymmetric register and processing path widths. The rationale is that control code favours shorter, simpler instructions; whereas data processing code favours a larger, more specialised instruction set and vector data values requiring wider data registers.
To improve the amount of data processed per unit time, the processor may be arranged with some degree of parallelism.
Referring to FIG. 2c, one example of parallelism is “long instruction word” (LIW) type processing. For instance in the illustrated processor, the fetch circuitry of the control unit 2 may fetch multiple instructions at a time in the form of instruction packets, each packet comprising a plurality of constituent instructions, and each instruction comprising its own respective opcode 28 and associated operand(s) 30 for performing its own respective operation. A suitable program compiler can identify (for example) pairs of instructions which can be executed in parallel, and arrange such pairs into packets in instruction memory for atomic execution. The compiler guarantees that there are no data dependencies between the instructions within a packet, so the machine need not check for such dependencies and can execute the constituent instructions simultaneously or in any order with respect to each other, provided execution is ordered with respect to other packets. Such packets may be referred to as a long instruction word (sometimes also called a “Very Long Instruction Word”, VLIW, especially if there are more than two instructions in each atomic packet). So in the illustrated processor, if the packet comprises a control instruction 32 and a data processing instruction 26, then the instruction decoder 12 directs them in parallel to the control unit 2 and data processing unit 4 respectively for parallel execution by those respective units (although if for example the packet comprises only control instructions then these may have to be executed sequentially).
Note therefore that it is not strictly accurate to refer to an “LIW instruction”, but rather an LIW packet. Each LIW packet in fact comprises multiple instructions, in the sense of an instruction being a discrete unit of code comprising a single opcode and any associated respective operands.
Referring to FIG. 2d, another example of parallelism is a type of vector processing referred to as SIMD (single instruction, multiple data) processing. According to a SIMD arrangement, rather than a single data element, each data register in the set 16 is able to hold a vector comprising a plurality of constituent data elements. The data processing logic 18 and memory load/store pathways operate on each element of the short vector operands substantially in parallel, in response to a single shared opcode. That is, a single load instruction can load a whole vector into a vector register of the set 16, and a single data processing operation (with a single opcode) can cause the data processing logic 18 to perform the same operation substantially simultaneously on each element of the loaded vector. For example, as shown schematically in FIG. 2d, if a first source register s holds a vector (s1, s2, s3, s4) and a second source register t holds a vector (t1, t2, t3, t4), then a single add instruction (comprising a single opcode and specifying only the two source registers s, t and a single destination register d) will operate to add the individual elements of the two source vectors and store the elements of the resulting vector (s1+t1, s2+t2, s3+ts, s4+t4) to respective elements of the destination register d, i.e. to (d1, d2, d3, d4) respectively.
Other forms of parallelism are also known, e.g. by means of superscalar processors. These are similar to LIW type processors in that they execute multiple instructions in parallel, except that they comprise additional hardware to detect and avoid dependency conflicts between the parallel instructions (whereas LIW processors require dependency conflicts to be avoided in advance by the compiler).