The present invention relates to the field of computer science and can be used for neural network emulation and real-time digital signal processing.
Network Algorithms Implementation/Y. P. Ivanov and others (Theses of reports of the Second Russian Conference  less than  less than Neural Computers And Their Application) greater than  greater than , Moscow, Feb. 14, 1996) //Neurocomputer.xe2x80x941996.xe2x80x94No.1,2.xe2x80x94pp.47-49], comprising an input data register and four neural units, each of them consists of a shift register, a weight coefficient register, eight multipliers, a multi-operand summation circuit and a block for the threshold function calculation.
Such neural processor executes weighted summation of fixed amount of input data for a fixed number of neurons in each clock cycle irrespective of the real range of input data values and their weight coefficients. In this case every input data as well as every weight coefficient are presented in the form of an operand with a fixed word length, determined by the bit length of the neural processor hardware units.
The closest one is neural processor [U.S. Pat. No. 5,278,945, U.S. C1. 395/27, 1994], comprising three registers, a multiplexer, a FIFO, a calculation unit to compute dot product of two vectors of programmable word length data with the addition of accumulated result and a nonlinear unit.
Input data vectors and their weight coefficients are applied to the inputs of such neural processor. In each clock cycle the neural processor performs weighted summation of several input data for one neuron by means of calculation the dot product of the input data vector by the weight coefficient vector. In addition the neural processor supports processing of vectors, which word length of separate elements may be selected from set of fixed values in program mode. With decreasing the word length of input data and weight coefficients their number in each vector increases and thus the neural processor performance improves. However, the word length of the obtained results is fixed and determined by the bit length of the neural processor hardware units.
A digital unit for saturation with saturation region, determined by absolute value of a number, is known [SU, No. 690477, Int. C1. G 06 F 7/38, 1979], comprising three registers, an adder, two code converters, two sign analyzing blocks, a correction block, two groups of AND gates and a group of OR gates. Such unit allows to calculate saturation functions for a vector with N input operands per 2N clock cycles.
The closest one is saturation unit [U.S. Pat. No. 5,644,519, U.S. C1. 364/736.02, 1997], comprising a multiplexer, a comparator and two indicators of the saturation. Such unit allows to calculate saturation functions for a vector with N input operands per N cycles.
A calculation unit is known [U.S. Pat. No. 5,278,945, U.S. C1. 395/27,1994], comprising multipliers, adders, registers, a multiplexer and a FIFO. Said unit allows to calculate dot product of two vectors, which contains M operands each, per one clock cycle and to multiply of a matrix containing Nxc3x97M operands by a vector consisting of M operands per N cycles.
The closest one is calculation unit [U.S. Pat. No 4,825,401, U.S. C1. 364/760, 1989], comprising 3N/2 AND gates, N/2 decoders for decoding a multiplier on the basis of Booth""s algorithm, a cell array of N columns by N/2 cells for multiplication, where each cell consists of a circuit to generate one bit of partial product on the basis of Booth""s algorithm and of a one-bit adder, a 2N-bit adder, N/2 multiplexers, N/2 additional circuits to generate one bit of partial product on the basis of Booth""s algorithm and N/2 implicators. Said unit allows to multiply two N-bit operands or to multiply element-by-element two vectors of two (N/2)-bit operands each per one clock cycle.
A unit for summation of vectors with programmable word length operands is known [U.S. Pat. No. 5,047,975, U.S. C1. 364/786, 1991], comprising adders and AND gates with inverted input.
The closest one is adder [U.S. Pat. No. 4,675,837, U.S. C1. 364/788, 1987], comprising a carry logic and in its every bitxe2x80x94a half-adder and an EXCLUSIVE OR gate. Said adder allows to add two vectors of N operands each per N cycles.
The neural processor comprises first, second, third, fourth, fifth and sixth registers, a shift register, an AND gate, first and second FIFOs, first and second saturation units, a calculation unit, incorporating inputs of first operand vector bits, inputs of second operand vector bits, inputs of third operand vector bits, inputs of data boundaries setting for first operand vectors and result vectors, inputs of data boundaries setting for second operand vectors, inputs of data boundaries setting for third operand vectors, first and second inputs of load control of third operand vectors into the first memory block, input of reload control of third operand matrix from the first memory block to the second memory block and outputs of bits of first and second summand vectors of results of the addition of first operand vector and product of the multiplication of second operand vector by third operand matrix, stored into the second memory block, an adder circuit, a switch from 3 to 2 and a multiplexer, and first data inputs of bits of the switch from 3 to 2, data inputs of the first FIFO, of first, second, third and fourth registers and parallel data inputs of the shift register are bit-by-bit coupled and connected to respective bits of first input bus of the neural processor, which each bit of second input bus is connected to second data input of the respective bit of the switch from 3 to 2, which first output of each bit is connected to input of the respective bit of input operand vector of the first saturation unit, which control input of every bit is connected to output of the corresponding bit of the second register, second output of each bit of the switch from 3 to 2 is connected to input of the respective bit of input operand vector of the second saturation unit, which control input of each bit is connected to output of respective bit of the third register, output of each bit of the first register is connected to first data input of the respective bit of the multiplexer, which second data input of each bit is connected to output of the respective bit of result vector of the first saturation unit, output of each bit of the multiplexer is connected to input of the respective bit of first operand vector of the calculation unit, which input of each bit of second operand vector is connected to output of the respective bit of result vector of the second saturation unit, data outputs of the first FIFO are connected to inputs of the respective bits of third operand vector of the calculation unit, which output of each bit of first summand vector of results of the addition of first operand vector and product of the multiplication of second operand vector by third operand matrix, stored into the second memory block, is connected to input of respective bit of first summand vector of the adder circuit, which input of each bit of second summand vector is connected to output of respective bit of second summand vector of results of the addition of first operand vector and product of the multiplication of second operand vector by third operand matrix, stored into the second memory block of the calculation unit, which each input of data boundaries setting for first operand vectors and result vectors is connected to output of the respective bit of the fifth register and to the respective input of data boundaries setting for summand vectors and sum vectors of the adder circuit, which output of each bit of sum vector is connected to respective data input of the second FIFO, which each data output is connected to the respective bit of output bus of the neural processor and to third input of the respective bit of the switch from 3 to 2, output of each bit of the fourth register is connected to data input of the respective bit of the fifth register and to the respective input of data boundaries setting for third operand vectors of the calculation unit, which each input of data boundaries setting for second operand vectors is connected to output of the respective bit of the sixth register, which data input of each bit is connected to output of the respective bit of the shift register, which sequential data input and output are coupled and connected to first input of load control of third operand vectors into the first memory block of the calculation unit and to first input of the AND gate, which output is connected to read control input of the first FIFO, second input of the AND gate, shift control input of the shift register and second input of load control of third operand vectors into the first memory block of the calculation unit are coupled and connected to respective control input of the neural processor, input of reload control of third operand matrix from the first memory block to the second memory block of the calculation unit and control inputs of fifth and sixth registers are coupled and connected to the respective control input of the neural processor, control inputs of the switch from 3 to 2, of the multiplexer and of first, second, third and fourth register, write control inputs of the shift register and of the first FIFO and read and write control inputs of the second FIFO are respective control inputs of the neural processor, state outputs of first and second FIFOs are state outputs of the neural processor.
The neural processor may include a calculation unit, comprising a shift register, performed the arithmetic shift of J bits left on all N-bit vector operands, stored in it, where J-minimal value that is the aliquot part of data word lengths in second operand vectors of the calculation unit, a delay element, a first memory block, containing sequential input port and N/J cells to store N-bit data, a second memory block, containing N/J cells to store N-bit data, N/J multiplier blocks, each of that multiply N-bit vector of programmable word length data by J-bit multiplier, and a vector adding circuit, generated partial product of the summation of N/J+1 programmable word length data vectors, and inputs of third operand vector bits of the calculation unit are connected to data inputs of the shift register, which outputs are connected to data inputs of the first memory block, which outputs of each cell are connected to data inputs of the respective cell of the second memory block, which outputs of each cell are connected to inputs of multiplicand vector bits of the respective multiplier block, which inputs of the multiplier bits are connected to inputs of the respective J-bit group of second operand vector bits of the calculation unit, outputs of each multiplier block are connected to inputs of bits of the respective summand vector of the vector adding circuit, which inputs of (N/J+1)-th summand vector bits are connected to inputs of first operand vector bits of the calculation unit, which inputs of data boundaries setting for third operand vectors are connected to respective inputs of data boundaries setting for operand vectors of the shift register, which mode select input is connected to first input of load control of third operand vectors into the first memory block of the calculation unit, which second input of load control of third operand vectors into the first memory block is connected to clock input of the shift register and to input of the delay element, which output is connected to write control input of the first memory block, write control input of the second memory block is connected to input of reload control of third operand matrix from the first memory block to the second memory block of the calculation unit, which every input of data boundaries setting for second operand vectors is connected to input of the sign correction of the respective multiplier block, inputs of data boundaries setting for first operand vectors and for result vectors of the calculation unit are connected to inputs of data boundaries setting for multiplicand vectors and for result vectors of each multiplier block and to inputs of data boundaries setting for summand vectors and result vectors of the vector adding circuit, which outputs of bits of first and second summand vectors of results are respective outputs of the calculation unit.
In the described above neural processor each saturation unit may comprise an input data register, which data inputs are inputs of respective bits of input operand vector of said unit, the calculation unit may comprise an input data register, which data inputs are inputs of respective bits of first and second operand vectors of the calculation unit, the adder circuit may comprise an input data register, which data inputs are inputs of respective inputs of the adder circuit.
The saturation unit comprises a carry propagation circuit and a carry look-ahead circuit, and each of N bits of said unit comprises first and second multiplexers and an EXCLUSIVE OR gate, an EQUIVALENCE gate, a NAND gate and an AND gate with inverted input, and second data inputs of the first and the second multiplexers and first input of the EXCLUSIVE OR gate of each bit of said unit are coupled and connected to input of the respective bit of input operand vector of said unit, which output of each bit of result vector is connected to output of the first multiplexer of the respective bit of said unit, non inverted input of the AND gate with inverted input and fist inputs of the NAND gate and the EQUIVALENCE gate of each bit of said unit are coupled and connected to the respective control input of said unit, first input of the EXCLUSIVE OR gate and non inverted input of the AND gate with inverted input of q-th bit of said unit are respectively connected to second input of the EXCLUSIVE OR gate and to inverted input of the AND gate with inverted input of (qxe2x88x921)-th bit of said unit, first data input of the second multiplexer of which is connected to output of the carry to (Nxe2x88x92q+2)-th bit of the carry propagation circuit (where q=2, 3, . . . , N), output of the NAND gate of n-th bit of said unit is connected to input of carry propagation through (Nxe2x88x92n+1)-th bit of the carry look-ahead circuit, which output of the carry to (Nxe2x88x92n+2)-th bit is connected to control input of the first multiplexer of n-th bit of said unit, output of the AND gate with inverted input of which is connected to control input of the second multiplexer of the same bit of said unit, to carry generation input of (Nxe2x88x92n+1)-th bit of the carry look-ahead circuit and to inverted input of the carry propagation through (Nxe2x88x92n+1)-th bit of the carry propagation circuit, which carry input from (Nxe2x88x92n+1)-th bit is connected to output of the second multiplexer of n-th bit of said unit (where n=1,2, . . . , N), initial carry inputs of the carry propagation circuit and of the carry look-ahead circuit, second input of the EXCLUSIVE OR gate, inverted input of the AND gate with inverted input and first data input of the second multiplexer of N-th bit of said unit are coupled and connected to xe2x80x9c0xe2x80x9d, and in each bit of said unit output of the second multiplexer is connected to second input of the EQUIVALENCE gate, which output is connected to first data input of the first multiplexer, and output of the EXCLUSIVE OR gate is connected to second input of the NAND gate of the same bit of said unit.
In particular cases of the saturation unit usage, when there are hard demands to minimize hardware expenses, output of the carry to q-th bit is connected to carry input from (qxe2x88x921)-th bit in the carry propagation circuit (where q=1,2, . . . , N), and the carry look-ahead circuit comprises AND gates and OR gates of quantity of N both, and each input of the carry propagation through the respective bit of the carry look-ahead circuit is connected to first input of the respective AND gate, which output is connected to first input of the respective OR gate, which second input and output are respectively connected to carry generation input of the respective bit of the carry look-ahead circuit and to output of the carry to the same bit of the carry look-ahead circuit, second input of the first AND gate is initial carry input of the carry look-ahead circuit, second input of q-th AND gate is connected to output of (qxe2x88x921)-th OR gate (where q=2,3, . . . , N).
The calculation unit comprises N/2 decoders of multiplier bits, N/2 AND gates with inverted input, a delay element, a N-bit shift register, which each bit consists of an AND gate with inverted inputs, a multiplexer and a trigger, and a multiplier array of N columns by N/2 cells, each of them consists of an AND gate with inverted input, an one-bit partial product generation circuit, an one-bit adder, a multiplexer, first and second triggers, functioned us memory cells of respectively first and second memory blocks of said unit, and input of each bit of first operand vector of said unit is connected to second input of the one-bit adder of the first cell of the respective column of the multiplier array, first input of the one-bit adder of each cell of which is connected to output of the one-bit partial product generation circuit of the same cell of the multiplier array, control inputs of multiplexers and inverted inputs of the AND gates with inverted input of all cells of each column of which are coupled and connected to respective input of data boundaries setting for first operand vectors and for result vectors of said unit, which each input of data boundaries setting for second operand vectors is connected to inverted input of the respective AND gate with inverted input, which output is connected to first input of the respective decoder of multiplier bits, respective control inputs of the one-bit partial product generation circuits of i-th cells of all columns of the multiplier array are coupled and connected to respective outputs of i-th decoder of multiplier bits, second and third inputs of which are connected to inputs of respectively (2ixe2x88x921)-th and (2i)-th bits of second operand vector of said unit (where i=1,2, . . . , N/2), non inverted input of j-th AND gate with inverted input is connected to third input of (jxe2x88x921)-th decoder of multiplier bits (where j=2, 3, . . . , N/2), input of each bit of third operand vector of said unit is connected to second data input of the multiplexer of the respective bit of the shift register, which first data input is connected to output of the AND gate with inverted inputs of the same bit of the shift register, which first inverted input is connected to respective input of data boundaries setting for third operand vectors of said unit, second inverted input of the AND gate with inverted inputs of q-th bit of the shift register is connected to first inverted input of the AND gate with inverted inputs of(qxe2x88x921)-th bit of the shift register (where q=2, 3, . . . , N), non inverted input of AND gate with inverted inputs of r-th bit of the shift register is connected to trigger output of (rxe2x88x922)-th bit of the shift register (where r=3, 4, . . . , N), control inputs of multiplexers of all shift register bits are coupled and connected to first input of load control of third operand vectors into the first memory block of said unit, clock inputs of triggers of all shift register bits and input of the delay element are coupled and connected to second input of load control of third operand vectors into the first memory block, output of the multiplexer of each shift register bit is connected to data input of the trigger of the same bit of the shift register, which output is connected to data input of the first trigger of the last cell of the respective column of the multiplier array, output of the first trigger of j-th cell of each multiplier array column is connected to data input of the first trigger of (jxe2x88x921)-th cell of the same multiplier array column (where j=2, 3, . . . , N/2), clock inputs of the first triggers of all multiplier array cells are coupled and connected to output of the delay element, clock inputs of the second triggers of all multiplier array cells are coupled and connected to input of reload control of third operand matrix from the first memory block to the second memory block, second data input of the one-bit partial product generation circuit of i-th cell of q-th multiplier array column is connected to output of the AND gate with inverted input of i-th cell of (qxe2x88x921 )-th multiplier array column (where i=1, 2, . . . , N/2 and q=2, 3, . . . , N), second input of the one-bit adder of j-th cell of each multiplier array column is connected to sum output of the one-bit adder of the (jxe2x88x921)-th cell of the same multiplier array column (where j=2, 3, . . . , N/2), third input of the one-bit adder of j-th cell of q-th multiplier array column is connected to output of the multiplexer of (jxe2x88x921)-th cell of (qxe2x88x921)-th multiplier array column (where j=2, 3, . . . , N/2 and q=2, 3, . . . , N), third input of the one-bit adder of j-th cell of the first multiplier array column is connected to third output of (jxe2x88x921)-th decoder of multiplier bits (where j=2, 3, . . . , N/2), sum output of the one-bit adder of the last cell of each multiplier array column is output of the respective bit of first summand vector of results of said unit, output of the multiplexer of the last cell of (qxe2x88x921)-th multiplier array column is output of q-th bit of second summand vector of results of said unit (where q=2, 3, . . . , N), which first bit of second summand vector of results is connected to third output of (N/2)-th decoder of multiplier bits, second inverted and non inverted inputs of the AND gate with inverted inputs of the first bit and non inverted input of the AND gate with inverted inputs of the second bit of the shift register, second data inputs of the one-bit partial product generation circuits of all cells of the first column of the multiplier array, third inputs of one-bit adders of first cells of all multiplier array columns and non inverted input of the first AND gate with inverted input are coupled and connected to xe2x80x9c0xe2x80x9d, and in each multiplier array cell the output of the first trigger is connected to data input of the second trigger, which output is connected to non inverted input of the AND gate with inverted input and to first data input of the one-bit partial product generation circuit, which third control input is connected to second data input of the multiplexer, which first data input is connected to carry output of the one-bit adder of the same cell of the multiplier array.
The adder circuit comprises a carry look-ahead circuit, and in each of N its bitsxe2x80x94a halfxe2x80x94adder, an EXCLUSIVE OR gate, first and second AND gates with inverted input, and input of each bit of first summand vector of the adder circuit and input of respective bit of second summand vector of the adder circuit are connected respectively to first and second inputs of the half-adder of respective bit of the adder circuit, inverted inputs of first and second AND gates with inverted input of each bit of the adder circuit are coupled and connected to respective input of data boundaries setting for summand vectors and sum vectors, output of the EXCLUSIVE OR gate of each bit of which is output of the respective bit of sum vector of the adder circuit, output of the first AND gate with inverted input of each bit of the adder circuit is connected to carry propagation input through the respective bit of the carry look-ahead circuit, which carry generation input of each bit is connected to output of the second AND gate with inverted input of the respective bit of the adder circuit, second input of the EXCLUSIVE OR gate of q-th bit of the adder circuit is connected to output of the carry to q-th bit of the carry look-ahead circuit (where q=2, 3, . . . , N), which initial carry input and second input of the EXCLUSIVE OR gate of the first bit of the adder circuit are connected to xe2x80x9c0xe2x80x9d, and in each bit of the adder circuit sum output of the half-adder is connected to first input of the EXCLUSIVE OR gate and to non inverted input of the first AND gate with inverted input, and carry output of the half-adder is connected to non inverted input of the second AND gate with inverted input of the same bit of the adder circuit.