Sophisticated high performance systems often include specialized hardware that either performs a desired task directly or that assists the processor in an embedded system. An example is the graphics system of a modern computer workstation. There one often finds entire hardware subsystems arranged in a serial pipeline to successively perform the various tasks that need to be accomplished. In a computer graphics environment there can be hardware that clips primitive shapes such as lines, planes and triangles, against other planes and the surfaces of rectangular solids. Additional hardware, for example, handles the task of polygon rendering, and so on. Despite such solutions, there is an irresistible urge for things to run faster, and not just in computer graphics systems, either.
We may generalize a bit and say that one approach for speeding things up (for a suitable class of problems) is to adopt a parallel architecture. Let us say that a single instance of a hardware set that solves a problem is a "channel". A parallel solution would then involve a plurality of identical (or nearly so) channels. Once this arrangement is adopted it becomes necessary to decide which channel is to be selected to operate on the next set of incoming data. There are many ways this might be done, but assuming that each channel is otherwise equally appropriate to perform the operation, and that perhaps there is some input buffering for each channel, a good way is often to assign a channel number k according to the relationship k=(some parameter in the data) mod n, where n is the number of channels. What the parameter is does not really matter, as long as it is suitable. For example, in a computer graphics system the parameter might be an address that some earlier subsystem assigned to a primitive that is to processed by the graphics system. The idea is to get an integer k in the range o.ltoreq.k&lt;n. This may be done by dividing the suitable parameter by n and taking k as the remainder. This brings us to the question of how to perform such divisions quickly and without undue expense. We note that the usual division mechanisms all are completely general in that they expect to accommodate arbitrarily variable divisors. We have in mind division by a fixed divisor, or a constant, or, perhaps by a rather limited number of divisors. It would be desirable if there were a way to take advantage of this limitation on the divisor to reduce the complexity of the division, and at the same time speed it up.
One prior art solution to this more restricted class of division problem has been to multiply the parameter by the reciprocal of the desired divisor. This provides the quotient, but not the integer remainder. Instead, you get the sum of the quotient plus a decimal, octal or binary (depending upon the radix in use) Arabic fraction. To recover the remainder (i.e., the integer numerator of an indicated division whose denominator is also our divisor) it is necessary to subtract out the quotient and exchange the remaining Arabic fraction for an integer (as by a look-up table). It would be desirable if a genuine division mechanism could instead be simplified to take advantage of the fixed divisor.
It should also be kept in mind that channel assignment in a parallel architecture is but one possible use for a mechanism that can divide an integer (or, perhaps, a fixed point number) by a constant.