1. Field of the Disclosure
The present disclosure is generally directed to a processor and, more particularly, a processor with a reconfigurable floating point unit.
2. Description of the Related Art
As is well known, a floating point unit (FPU) is a part of a processor system that is specially designed to carry out operations on floating point numbers. Typical floating point operations include addition, subtraction, multiplication, division, and square root. Some processor systems may also perform various transcendental functions, such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines. In most modern general purpose computer architectures, an FPU includes multiple execution units. In these architectures, floating point operations are usually performed independently of integer operations and are generally pipelined. Execution units of an FPU may be specialized, and divided between simpler operations (e.g., addition and multiplication) and more complicated operations (e.g., division). In some cases, only the simple operations are implemented in hardware, while the more complex operations are emulated.
As is well known, an instruction set defines instructions that a processor can execute. Instructions include arithmetic instructions (e.g., add and subtract), logic instructions (e.g., AND, OR, and NOT instructions), and data instructions (e.g., move, input, output, load, and store instructions). An instruction set, or instruction set architecture (ISA), is the part of the processor architecture related to programming, including native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external input/output (I/O). An ISA includes a specification of a set of opcodes (operational codes), i.e., native commands implemented by a particular central processing unit (CPU) architecture. As is well known, an opcode is the portion of a machine language instruction that specifies the operation to be performed. A complete machine language instruction contains an opcode and, usually, specifies one or more operands, i.e., data for the operation to act upon. The operands upon which opcodes operate may, depending on the CPU architecture, consist of registers, values in memory, values stored in a stack, I/O ports, a data bus, etc.
As is well known, computers with different microarchitectures can share a common instruction set. For example, processors from different manufacturers may implement nearly identical versions of an instruction set, e.g., an x86 instruction set, but have substantially different internal designs. Typical complex instruction set computers (CISCs) have instructions that combine one or two basic operations (such as “add” and “multiply”) with implicit instructions for accessing memory, incrementing registers upon use, or de-referencing locations stored in memory or registers. Reduced instruction-set computers (RISC) trade off simpler and faster instruction set implementations for lower code density (that is, more program memory space to implement a given task). RISC instructions typically implement only a single implicit operation, such as an “add” of two registers or the “load” of a memory location into a register.
A number of different instruction sets have been employed in x86 type processors over the years. For example, the matrix math extension (MMX) instruction set was introduced in 1997. In general, as the MMX was designed to re-use existing floating point registers of prior CPU designs, a CPU executing an MMX instruction could not work on floating point and single-instruction multiple-data (SIMD) type data at the same time. Furthermore, the MMX instruction set was only designed to work on integers. The streaming SIMD extension (SSE) instruction set was introduced in 1999 to add to the functionality of the MMX instruction set. The SSE instruction set added eight new 128-bit registers, referred to as XMM0 through XMM7. Each 128-bit register packed together four 32-bit single-precision floating point numbers. In the SSE instruction set, the 128-bit registers are disabled by default until an operating system explicitly enables them and are additional program states that the operating system is required to preserve across task switches. Due to the addition of floating point support, the SSE instruction set (and later versions of the SSE instruction set) is more widely used than the MMX instruction set.
The SSE2 instruction set added new math instructions for double-precision (64-bit) floating point and 8/16/32-bit integer data types, all operating on the same 128-bit XMM vector register file previously introduced with SSE. The SSE3 instruction set added a handful of digital signal processor (DSP) oriented mathematical instructions and some process (thread) management instructions to the SSE2 instruction set. The SSSE3 instruction set added sixteen new opcodes (to the SSE3 instruction set), which included permuting bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding and within word accumulate instructions. The SSE4 instruction set added a dot product instruction, additional integer instructions, etc.
The use of the same reference symbols in different drawings indicates similar or identical items.