The invention concerns a digital processing device P, particularly for processing of digital data and signal structures, wherein the data and signal structures comprise repeated sequences and/or nested patterns, and wherein the processing device P generally is configured as a regular tree with n+1 levels S0, S1, . . . Sn and of degree k.
Processing of large data volumes with use of repeated or recursive operations on very large data volumes can even in a restricted number often be a bottleneck when using conventional microprocessors and is thus amenable to massively parallel solutions wherein a very large number of processing elements simultaneously execute different operations in parallel on a large data stream, but possibly also parallel operations on several data streams. If such large data volumes appear in a form of data or signal structures with repeated sequences and/or nested patterns, the processing can be made more effective by being realized in parallel on the same or several different levels
From U.S. Pat. No. 486,020 (Stolfo and al.) there is known a parallel processing device structured as a binary tree, wherein a very large number of processors each with their own I/O unit are used. Generally Stolfo and al. discloses a computer with a very large number of processors connected in a binary tree structure such that each processor apart from those which constitute the root and the leaves of the tree has a single parent processor and two child processors. The processors typically work synchronously with data which are transmitted thereto from the parent processor and communicate the results on to the nearest succeeding processors, that is the parent processors"" children. Simultaneously the child processors and the parent processor may also communicate with each other. According to Stolfo and al. each node constitutes a processing element which comprises a processor in the proper sense, a read/write memory or a random access memory, and an I/O device. The I/O device provides interfaces between each processing element and its parent and child processing elements such that a substantial improvement in speed whereby data are sent through the binary tree structure is obtained. As the binary tree structure has a processing element in every single node, the processing device will generally comprise 2nxe2x88x921 processing elements, that is 1023 processing elements if the binary tree is realized with 10 levels. In a preferred embodiment the known parallel processing device has a clock frequency of 12 MHz, which in the case of using a tree with 1023 processors which each has an average instruction cycle time of 1.8xcexcs, provides a processing performance of about 570 million instructions per second.
A binary parallel processor of this kind may for instance be well suited for handling decomposable or partitionable data processing problems, for instance searching in large information volumes. A partitionable search problem can be defined as a problem where a query about a relation between an object x and an object set corresponds to a repeated use of a commutative and associative binary operator b which has an identity and a primitive query which is applied between a new object x and each element f in the set F. One then has a partitionable search problem when the logic function OR is combined with the primitive query xe2x80x9cis x=fxe2x80x9d applied between the object x and each element f i F. As mentioned by Stolfo and al a problem which consists of answering a query about set F, may be answered by combining the answers of the queries applied to arbitrary subsets of F. The problem is in other words partitionable or decomposable and well suited for rapid execution by means of parallel processing. The set F is partitioned in a number of arbitrary subsets equal to the number of available processors. The primitive query q is then applied in parallel in each processor between the unknown x which is communicated to all processors and the locally stored element f in the set F. The results are then combined in parallel by log2N repetitions of the operator b, as a number of computations first is executed on N/2 adjacent pairs of processors and then a corresponding number of computations on N/4 pairs of processors with the results from the first computations. The operations hence move during the process to overlying levels in the binary tree, in other words from child processors to the parent processor etc. and are repeated in parallel on each level.
There is however a number of data processing problems wherein the data and signal structures comprise repeated sequences and/or nested patterns which are such that a processing device of the kind that is disclosed in U.S. Pat. No. 4,860,201 does not provide the desired flexibility or may not at all be suited for handling the problem. A binary tree structure as disclosed therein presupposes in principle that the problem can be binary partitioned and that operations take places in parallel on the same level. However, there may be problems which demand another degree of decomposition and where processing must be able to take place in parallel, but on different levels in the tree structure. The problems can also be partitioned such that it will be desirable with a larger partitioning capacity on one and the same level in some of the subtrees in the tree structure, and this in practice requires solutions which takes its starting point in a general tree structure which not only has an arbitrary number of levels, but also arbitrary degree, while nodes in subtrees not are only connected with the parent node of the tree in question, but for instance may be connected to a node on the same or underlying levels in neighbour trees. An increased degree of connectivity in a tree structure with a desired number of levels and of arbitrary degree will hence make it possible to reconfigure the original tree structure, either in the form of reduced trees or simple or complex graphs. Simultaneously can one or more of the leaf nodes be combined and take over the function of the parent node in question. 
The object to the present invention is thus to provide a processing device which particularly suited for processing large data volumes in massive parallelism and on different levels in a general tree structure, but which simultaneously also can be configured arbitrarily as nested circuits on different levels and preferably under determined conditions such that a selected configuration on the given level is generated recursively by a configuration on an underlying level. Particularly it is the object that the processing device according to the invention shall be able to realize an MIMD processing device, that is a processing device which works with multiple instructions and multiple data.
The above-mentioned and other objects are obtained according to the invention with a digital processing device which is characterized in that the processing device P is provided in the form of a circuit Pn on the level Sn and forms the root node of the tree, that the nearest level Snxe2x88x921 is provided nested in the circuit Pn and comprises k circuits Pnxe2x88x921 which form the child nodes of the root node, that generally an underlying level Snxe2x88x92q in the circuit Pn, where q∈{1,2 . . . nxe2x88x921}, comprises kq circuits Pnxe2x88x92q provided nested in the kqxe2x88x921 circuits Pnxe2x88x92q+1 on the overlying level Snxe2x88x92q+1, each circuit Pnxe2x88x92q+1 on this level comprising k circuits Pnxe2x88x92q, that a defined zeroth level Snxe2x88x92q=S0 in the circuit Pn for q=n comprises from knxe2x88x921+1 to kn circuits P0 which constitute kernel processors in the processing device P and on this level S0 form leaf nodes in the tree, the kernel processor P0 being provided nested in a number of 1 to k in each of the knxe2x88x921 circuits P1 on the level S1, that each of the circuits P1, P2 . . . Pn on respective levels S1, S2 . . . Sn comprises a logic unit E which generally is connected with those circuits P0, P1 . . . Pnxe2x88x921 on the respective nearest underlying level S0, S1 . . . Snxe2x88x92provided nested in the respective circuits P1, P2, . . . Pn and according to choice configures a network of the former circuits in the respective circuits P1, P2, . . . Pn, and that each of the circuits P0, P1 . . . Pn has identical interfaces I.
Advantageously a first embodiment of the processing device according to the invention is characterized in that the zeroth level S0 comprises kn kernel processors P0, that a kernel processor P0 recursively maps a circuit P1 on the overlying level with a mapping factor r=k, such that the tree is a unreduced or complete tree, and that generally a circuit Pnxe2x88x92q on the level Snxe2x88x92q maps a circuit Pnxe2x88x92q+1 on the overlying level Snxe2x88x92q+1 recursively with the factor r=k.
Further a second embodiment of the processing device according to the invention is advantageously characterized in that the zeroth level S0 comprises rknxe2x88x921 kernel processors P0, 1 less than r less than k, that a kernel processor P0 maps a circuit P1 on the overlying level S1 with the mapping factor r, 1 less than r less than k, such that the tree is a symmetrically reduced or balanced tree, and that generally a circuit Pnxe2x88x92q on all levels from the level S1 maps a circuit Pnxe2x88x92q+1 on the overlying level Snxe2x88x92q+1 recursively with the mapping factor r=k.
Finally a second embodiment of the processing device according to the invention is advantageously characterized in that respectively from 1 to k kernel processors are provided nested in each circuit P1 on the level S1, that at least one of the circuits P1 comprises at least 2 and at most kxe2x88x921 kernel processors P0, such that the total number of kernel processors P0 on the level S0 is at least knxe2x88x921+1 and at the most knxe2x88x921 and the tree becomes an asymmetrically reduced or unbalanced tree, and that generally a circuit Pnxe2x88x92q on the level Snxe2x88x92q is mapped by the circuits Pnxe2x88x92qxe2x88x921 nested in the respective circuit Pnxe2x88x92q.
According to the invention the kernel processor P0 advantageously comprises at least one combinatorial unit C and a memory unit M connected with at least one combinatorial unit C, at least a part of the memory unit M preferably being configured as a register unit R. In the latter case can preferably then at least one combinatorial unit C and the register unit R be configured as an arithmetic logic unit ALU. Advantageously comprises according to the invention the logic unit at least one combinatorial unit C and a register unit R connected with the at least one combinatorial unit, the at least one combinatorial unit C preferably being a multiplexer. It is then advantageous that the logic unit E in a circuit Pnxe2x88x92q is adapted to be connected with the logic unit E in a corresponding circuit Pnxe2x88x92q on the same level as Snxe2x88x92q in a neighbour tree.
It is also advantageous that the logic unit E in a circuit Pn xe2x88x92q is adapted to be connected with the logic unit E in circuits Pnxe2x88x92qxe2x88x921, Pnxe2x88x92qxe2x88x922, . . . P1 on respective underlying levels Snxe2x88x92qxe2x88x921, Snxe2x88x92qxe2x88x922, . . . S1 in a neighbour tree.
Finally it is also advantageous that the logic unit E in a circuit Pnxe2x88x92q is adapted to be connected with one or more kernel processors P0 in a neighbour tree, either directly or via the logic unit E in the circuit P1, where the kernel processor P0 or the kernel processors P0 in question are nested.