This invention relates generally to a coprocessor and more particularly to a coprocessor resident in the memory map of a processor, the coprocessor being able to perform floating point operations on scalars and vectors for the processor.
A Weitek 1167 coprocessor has been utilized in the past to perform floating point operations for an Intel 80386 processor. The 80386 utilizes the 1167 because the 1167 is much faster at floating point operations than the 80386. However, the transfers of data between the 80386 and the 1167 is still limited by the speed of the 80386. As a result, any mechanism used to increase the speed of the interface between the 80386 and the 1167 will increase the overall speed of the 80386.
As shown in FIG. 1, the 1167 is a plug-in circuit board containing a Weitek 1163 controller, a Weitek 1164 multiplier, a Weitek 1165 arithmetic logic unit (ALU), and other logic circuitry. The 1167 can handle various types of floating point operations for the 80386 such as add, subtract, multiply, and divide. The 1167 can perform these floating point operations on scalars (32 bits), single precision vectors (each vector element contains 32 bits), and double precision vectors (each vector element contains 64 bits).
The 1167 resides in the memory map of the 0386. Instructions are passed from the 80386 to the 1167 by accessing particular addresses in the memory map of the 80386. Data is passed between the 80386 and the 1167 by passing the data to particular addresses in the memory map of the 80386. However, only 32 bits of data (or one half of a double precision vector element) can be transferred at a time.
The 1167 uses the Motorola format for storing double precision vectors within its registers. Each vector element is split into two halves, a most significant half and a least significant half. The least significant half is stored one register higher than the most significant half. For example, if the most significant half is stored at register R6, the least significant half is stored at register R7. In addition, the 1167 always stores the most significant half in an even register and the least significant half in an odd register. The Intel 80386 uses the Intel format for storing double precision data in its memory. The Intel format is opposite from the Motorola format. For example, if the least significant half is stored at address M, the most significant half is stored at address M+1.
As a result of the format for storing double precision vectors, a double precision command with an odd destination register is an invalid 1167 instruction. Such a command would point to the middle of a double precision element resulting in an meaningless operation.
The 80386 has the ability to use block move instructions to command the 1167 to perform floating point operations on single precision vectors. The 80386 block move instructions are in assembly language and are converted by the 80386 into repetitive commands with incrementing source and destination addresses to be sent to the 1167. The 80386, as well as other processors, is able to convert block move instructions into 1167 commands faster than it can convert other nonrepetitive move instructions into 1167 commands. Hence, block move commands increase the speed of the 80386 to 1167 interface, yielding higher computational rates and greater efficiency. The 80386 converts a block move instruction from assembly language to a series of repetitive commands with incrementing source and destination addresses. There is one command per single precision vector element. Each command transfers the corresponding vector element between the 80386 and the 1167 and then performs a task with that element such as load, multiply, add, or store from the source address to the destination address.
However, the 80386 cannot use the block move instructions to command the 1167 to perform floating point operations on double precision vectors. One reason is because the double precision vectors can be transferred only one half of a vector element at a time. A complete vector element must be transferred before a vector operation can validly be performed on it, whereas every command created by block move instruction performs a vector operation on its corresponding half vector element. A second reason is because the 80386 and the 1167 use different formats for storing double precision vectors. A double precision vector being transferred must have each vector element reversed in order. A third reason is the restriction that a double precision vector must not have an odd destination register. As a result of these restrictions, the greater efficiencies of the block move instructions are not utilized for double precision floating point operations.