1. Field of the Invention
The present invention relates to arithmetic integer division and particularly to arithmetic integer division implemented on data flow processors.
2. Description of the Relevant Art
The conventional computer system, formulated by John Von Neuman, consists of a central processing unit (a CPU) which sequentially operates on data and instructions held in memory. An instruction is read from memory into the CPU, decoded, then executed. Data is read from memory and operated upon. New data is generated and stored in memory. These sequence of steps are repetitiously performed in the conventional Von Neuman computer architecture. On the average, every other step involves memory access. This memory access path has been characterized as the "Von Neuman bottleneck." This condition becomes critical when there is more than one processor wanting to access the same memory at the same time. Additional hardware and software overhead is required to control and synchronize the processors.
The entire computational system of a Von Neuman-based computer is based upon a clock for sequential operation of the processor. A Von Neuman processor is a sequential machine. That is, each step in the computation has a pre-defined place in time. These processors are of a deterministic nature, as demonstrated by most all software which is executed in an in-line manner. One can trace the execution of this software (code) and the CPU is expected to process each line of code sequentially.
An alternative computer architecture called "data flow" resolves some of the inefficiencies of the conventional computer. In data flow processors, like the NEC .mu.PD7281, program control is decentralized by allowing data to independently govern their own operations within the computer system.
In a traditional CPU each step in the placing of information into and out of the CPU is centrally controlled. In a data flow processor each piece of data knows where it is going to. A transportation analogy can be made as follows: imagine every automobile in the country had to be controlled by a central sequencer, directing the flow of every car at every step of the way from start to finish. This is exactly the condition that exists in a conventional Von Neuman CPU. However, what allows so much automobile traffic to flow as smooth as it does, in general, is that each automobile knows where to go and how to get there. The latter defines data control in a data flow architecture.
There is no predetermined sequence of instructions in a data flow machine. Data elements are independently made autonomous by associating a label or tag with the data element instructing the data element where to go and what to do when it gets there. The basic element in data flow architecture consists of the data element together with its label and is called a "token". Tokens flow along links to the functions that will operate upon them. The token waits at the function until all other tokens containing the required operands arrive. Only when a matched set of tokens is provided will the function be executed. The operation results in the generation of new tokens which independently flow to their assigned functions. In other words, data flow execution is defined by having enough information at a particular point to be processed. If not enough data is available, no execution takes place. The data flow processor will execute operations on data sets that are sufficient.
It would appear that the flow of data in a data flow machine is disorganized, but data flow software and hardware work together to keep the flow of tokens organized to prevent information traffic jams from occurring. For example, queues can serve as traffic lights to hold data tokens in transit when their destination functions are busy. A data flow program is written as a description of a graph, which is basically a map of a city of functions and interconnecting links which the tokens are to traverse during the operation of the program.
In contrast to traditional Von Neuman architectures, data flow architecture allows operations to be performed essentially in parallel. Each token moves independently of the others, and each function is evaluated independently of the others as soon as a sufficient data token set arrives. If the token is not destined for the particular processor, it just moves on. In addition, a scalar increase in performance is achieved in connecting multiple data flow processors together. This is not the case with traditional processors, which have a finite upper limit of performance in parallel.
As a result of the inherently parallel nature of a data flow machine and the autonomous nature of the token in data flow architecture, the time required for accessing memory for instructions or data is eliminated. While some tokens wait to be matched, other matched tokens are being processed, instead of waiting their turn in memory. This allows for a more efficient execution of programming instructions than in the traditional Von Neuman machine.
Arithmetic division is the most cumbersome of the elementary arithmetic operations to implement on a computer. The conventional method of performing arithmetic division on a computer is to take the denominator and continually subtract it from the numerator until a carry condition occurs (the remainder is smaller than the denominator). For comparably sized operands, this is efficient, however, when the denominator is much smaller than the numerator, a relatively large number of subtractions must be performed to arrive at an answer. For example, in performing the division 65,536/1, the conventional method requires 65,536 subtractions to arrive at the result.
The conventional method of subtract until carry is inherently serial, and is widely used in classic Von Neuman serial computers. Any adaptation of this method to data flow computing would still inherit the serial inefficiency of numerous repetitive subtractions. Data flow processing can streamline the conventional subtract until carry method somewhat by updating counters or indices while simultaneously performing subtractions, however, the same number of subtractions would still have to be performed.