The present invention concerns parallel data processing in a single processor system.
In general, single processor systems sequentially perform operations on two operands. For example, in a 32-bit computer, each integer operand is 32 bits. In a 64-bit computer, each integer operand is 64 bits. Thus an integer "add" instruction, in a 64-bit computer, adds two 64-bit integer operands to produce a 64-bit integer result. In most pipelined 64-bit processors, a 64-bit add instruction takes one cycle of execution time.
In many instances the pertinent range of operands is 16 bits or less. In current 32-bit and 64-bit computers, however, it still takes a full instruction to perform an operation on a pair of 16-bit operands. Thus the number of execution cycles required to perform an operation on two 16-bit operands is the same as the number of execution cycles required to perform the operation on two 32-bit operands in a 32-bit computer or two 64-bit operands in a 64-bit computer.
In the prior art, parallel data processing required replicating of functional units, each functional unit able to handle full word length data. See for example, Michael Flynn, Very High-Speed Computing Systems, Proceedings of IEEE, Vol. 54, No. 12, December 1966, pp. 1901-1909. However, such implementations of parallel processing is significantly costly both in terms of hardware required and complexity in design.