The present invention is directed to a unit for processing numeric and logic operations.
German Published Patent No. Appln. DE 44 16 881 A1 describes a method of processing data, where homogeneously arranged cells which can be configured freely in function and interconnection are used.
Independently of the above-mentioned publication, FPGA (field programmable gate array) units are being used to an increasing extent to assemble arithmetic and logic units and data processing systems from a plurality of logic cells.
Another known method is to assemble data processing systems from fixed program-controlled arithmetic and logic units with largely fixed interconnections, referred to as systolic processors.
Units according to the method described in DE 44 16 881 A1 (referred to below as VPUs) are very complicated to configure owing to the large number of logic cells. To control one logic cell, several control bits must be specified in a static memory (SRAM). There is one SRAM address for each logic cell. The number of SRAM cells to be configured is very large, which requires a great deal of space and time for configuring and reconfiguring such a unit. The great amount of space required is especially problematical because the processing power of a VPU increases with an increase in the number of cells. However, the area of a unit that can be used is limited by chip manufacturing technologies. The price of a chip increases approximately proportionally to the square of the chip area. It is impossible to broadcast data to multiple receivers simultaneously because of the repeated next-neighbor interconnection architecture. If VPUs are to be reconfigured on site, it is absolutely essential to achieve short reconfiguration times. However, the large volume of configuration data required to reconfigure a chip stands in the way of this. There is no possibility of separating cells from the power supply or having them cycle more slowly to minimize the power loss.
In the field of processing numeric and logic operations, FPGAs comprise multiplexers or look-up table (LUT) architectures. SRAM cells are used for implementation. Because of the plurality of small SRAM cells, they are very complicated to configure. Large volumes of data are required, necessitating a comparably large amount of time for configuration and reconfiguration. SRAM cells take up a great deal of space. However, the usable area of a unit is limited by the chip manufacturing technologies. Here again, the price increases approximately proportionally to the square of the chip area. SRAM-based technology is slower than directly integrated logic due to the SRAM access time. Although many FPGAs are based on bus architectures, there is no possibility of broadcasting for rapid and effective transmission of data to multiple receivers simultaneously. If FPGAs are to be reconfigured on site, it is absolutely essential to achieve short configuration times. However, the large volume of configuration data required stands in the way. FPGAs do not offer any support for reasonable on-site reconfiguration. The programmer must ensure that the process takes place properly without interfering effects on data and surrounding logic. There is no intelligent logic to minimize power loss. There are no special function units to permit feedback on the internal operating states to the logic controlling the FPGA.
Reconfiguration is completely eliminated with systolic processors, but these processors are not flexible because of their rigid internal architecture. Commands are decoded anew in each cycle. As already described in the previous sections, there are no functions which include broadcasting or efficient minimization of power loss.
The present invention comprises a cascadable ALU which is configurable in function and interconnection. No decoding of commands is needed during execution of the algorithm. The present invention can be reconfigured on site without any effect on surrounding ALUs, processing units, or data streams. The volume of configuration data is very small, which has positive effects on the space required and the configuration speed. Broadcasting is supported through the internal bus systems in order to distribute large volumes of data rapidly and efficiently. The ALU is equipped with a power-saving mode to shut down power consumption completely. There is also a clock rate divider which makes it possible to operate the ALU at a slower clock rate. Special mechanisms are available for feedback on the internal states to the external controllers.
The present invention describes the architecture of a cell in the sense of German Patent DE 44 16 881 A1 or known FPGA cells. An expanded arithmetic and logic unit (EALU) with special extra functions is integrated into this cell to perform the data processing. The EALU is configured by a function register, which greatly reduces the volume of data required for configuration. The cell can be cascaded freely over a bus system, the EALU being decoupled from the bus system over input and output registers. The output registers are connected to the input of the EALU to permit serial operations. A bus control unit is responsible for the connection to the bus, which it connects according to the bus register. The unit is designed so that distribution of data to multiple receivers (broadcasting) is possible. A synchronization circuit controls the data exchange between multiple cells over the bus system. The EALU, the synchronization circuit, the bus control unit and registers are designed so that a cell can be reconfigured on site independently of the cells surrounding it. A power-saving mode which shuts down the cell can be configured through the function register; clock rate dividers which reduce the working frequency can also be set.