This application is being filed concurrently with related U.S. patent applications: U.S. patent application Ser. No. 09/802,017 filed on Mar. 8, 2001 entitled xe2x80x9cVLIW Computer Processing Architecture with On-chip DRAM Usable as Physical Memory or Cache Memoryxe2x80x9d; U.S. patent application Ser. No. 09/802,289 filed on Mar. 8, 2001, entitled xe2x80x9cVLIW Computer Processing Architecture Having a Scalable Number of Register Filesxe2x80x9d; U.S. patent application Ser. No. 09/802,108 filed on Mar. 8, 2001 entitled xe2x80x9cComputer Processing Architecture Having a Scalable Number of Processing Paths and Pipelinesxe2x80x9d; U.S. patent application Ser. No. 09/802,324 filed on Mar. 8, 2001 (now U.S. Pat. No. 6,631,439, issued on Oct. 7, 2003), entitled xe2x80x9cVLIW Computer Processing Architecture with On-chip Dynamic RAMxe2x80x9d; U.S. patent application Ser. No. 09/802,120 filed on Mar. 8, 2001, entitled xe2x80x9cVLIW Computer Processing Architecture Having the Program Counter Stored in a Register File Registerxe2x80x9d; U.S. patent application Ser. No. 09/801,564 filed on Mar. 8 2001, entitled xe2x80x9cProcessing Architecture Having Parallel Arithmetic Capabilityxe2x80x9d; U.S. patent application Ser. No. 09/802,196 filed on Mar. 8, 2001, entitled xe2x80x9cProcessing Architecture Having an Array Bounds Check Capabilityxe2x80x9d; U.S. patent application Ser. No. 09/802,020 filed on Mar. 8, 2001, entitled xe2x80x9cProcessing Architecture Having a Matrix-Transpose Capabilityxe2x80x9d; and, U.S. patent application Ser. No. 09/802,291 filed on Mar. 8, 2001, entitled xe2x80x9cProcessing Architecture Having a Compare Capabilityxe2x80x9d; all of which are incorporated herein by reference.
The present invention relates generally to an improved computer processing instruction set, and more particularly to an instruction set having a byte swapping function.
Computer architecture designers are constantly trying to increase the speed and efficiency of computer processors. For example, computer architecture designers have attempted to increase processing speeds by increasing clock speeds and attempting latency hiding techniques, such as data prefetching and cache memories. In addition, other techniques, such as instruction-level parallelism using VLIW, multiple-issue superscalar, speculative execution, scoreboarding, and pipelining are used to further enhance performance and increase the number of instructions issued per clock cycle (IPC).
Architectures that attain their performance through instruction-level parallelism seem to be the growing trend in the computer architecture field. Examples of architectures utilizing instruction-level parallelism include single instruction multiple data (SIMD) architecture, multiple instruction multiple data (MIMD) architecture, vector or array processing, and very long instruction word (VLIW) techniques. Of these, VLIW appears to be the most suitable for general purpose computing. However, there is a need to further achieve instruction-level parallelism through other techniques.
The present invention swaps bytes in such a way that allows selecting the field the destination register receives from the source register. In one embodiment, a processing core that includes a first source register, a second source register, a multiplexer, a destination register, and an operand processor is disclosed. The first source register includes a plurality of source fields. The second source register includes a number of result field select values and a number of operation fields. The multiplexer is coupled to at least one of the source fields. Included in the destination register is a plurality of result fields. The operand processor and multiplexer operate upon at least one of the source fields.
A more complete understanding of the present invention may be derived by referring to the detailed description of preferred embodiments and claims when considered in connection with the figures, wherein like reference numbers refer to similar items throughout the figures.