A important consideration in a high performance data processing system is the speed of execution of instructions, such as arithmetic operations, which are distributed between processing entities or units. In order to provide for a significant performance improvement when calculating certain types of mathematical operations, such as single or double precision floating point operations, it is known to provide a special purpose arithmetic unit (AU) which is coupled to a central processing unit (CPU), the arithmetic unit executing arithmetic operations under control of the CPU. For some applications, such as COBOL environments, fast binary coded decimal (BCD) calculations and string-related operations are important requirements. However, many conventional AU devices have limited capability for handling string operands and BCD numbers. Furthermore, for many conventional AU devices the coupling strength or tightness between the AU and the CPU is less than optimal, resulting in significant latency and "dead" cycles when synchronizing the operation of the AU to that of the CPU.
The overall processing efficiency in such a CPU/AU system is related to a number of factors. The signal coupling and timing between the CPU and the AU are two such factors. For example, it is desirable for some types of calculations that the CPU and AU operate asynchronously to one another while the AU is performing a calculation and that the CPU and AU be rapidly resynchronized to one another when a result is generated by the AU. Such asynchronous operation provides for a processing concurrency which increases overall CPU throughput. Furthermore, rapid resynchronization is important in that a next instruction in the CPU may require the result of the instruction being executed in the AU, such as expressed by condition codes, so that a next CPU instruction may test the condition codes to determine a branch path.
Another important factor is the nature of the coupling between the AU and a memory unit wherein operands and results are stored. For example, it is important that the AU fetch operands from a memory and store results back into the memory in a manner which consumes a minimum amount of memory bus bandwidth. Also, it is important that the AU fetch and store data which may not be aligned on an even memory word boundary or fetch and store data which crosses a memory word boundary or boundaries. This latter requirement is made even more demanding if the CPU is responsible for addressing and operating the memory simultaneously with an AU read of operands or an AU storage of result data. Such a shared memory access capability increases the CP/AU interface complexity.