Field of the Invention
The invention relates to large-scale integrated circuits which, contain a plurality of individual modules with a wide variety of functions and complexity. The modules can be flexibly configured or programmed depending on the use of the system. On the one hand, the individual modules can be supplied with programs using the lowest possible instruction bandwidth, and, on the other hand, can be coordinated and synchronized. A synchronous response of the individual modules can be achieved very easily, by way of example, by permanently assigning individual bits in very long instruction words VLIW to the respective individual modules.
If such a large-scale integrated system is used, by way of example, for processing a two-dimensional image whose pixels are read in sections from an image memory, are processed pixel-by-pixel in an arithmetic and logic unit and are finally written back to the image memory, then address generators for reading from and writing to the image memory are required in addition to the arithmetic and logic unit. The address generators have to take account of the two-dimensional nature of the object to be processed during address calculation, for example using two interleaved program loops. The arithmetic and logic unit, on the other hand, need know only the total number of pixels to be processed, but not their arrangement in the memory. A single program loop is therefore sufficient for the arithmetic and logic unit. If the instructions for the address generators and for the arithmetic and logic unit are combined into an instruction word VLIW of corresponding length, however, then the resulting program unnecessarily also has two interleaved loops for portion of the program that belongs to the arithmetic and logic unit.
In the operation of such systems, the situation frequently arises that not all the modules are active at the same time. This means that transmission bandwidth to the instruction memory, which is generally located outside the integrated circuit, is wasted, since a relatively large amount of so-called NOP instructions (no operation) are also transmitted for the modules not currently needed. One possibility of saving bandwidth to the instruction memory involves the instruction words being stored in the instruction memory in compressed form, that is to say NOP instructions largely removed, and the missing NOP instructions being added again only on the path to the individual modules.
U.S. Pat. No. 5,774,737 (see European patent application EP 0 768 602 A 2) discloses an apparatus for the hierarchical and distributed control of programmable modules. There, a multiplicity of control modules interchange control information with a superordinate control unit and use control lines to drive processing modules permanently assigned to the respective control modules.
The publication by Bernhard K. Gunther: Multithreading with Distributed Functional Units, in IEEE Transactions on Computers, Vol. 46, No. 4, April 1997, pages 399-411 discloses a synchronization unit for an apparatus having a multiplicity of control and processing modules.