1. Field of the Invention
The present invention relates to a two-dimensional PE (processing element) array, CAM (content addressable memory), data transfer method and mathematical morphology processing method, which serve for in image processing, audio and speech processing and information processing.
2. Description of Related Art
Demands for high performance image processing, audio and speech processing and information processing are increasing with the visualization and high value addition of network services. These processings, however, are difficult to implement by the conventional microprocessors or digital signal processors based on von Neumann architecture because they need enormously high processing performance.
As an apparatus to perform such processings effectively, a two-dimensional PE (processing element) array is known. The two-dimensional PE array is provided with an enormous number of PEs for carrying out various types of logical and arithmetic operations, and a controller for providing each PE with a single instruction, multiple data stream (SIMD). The two-dimensional PE array is a device comprising a parallel operation mechanism of the PEs and a data transfer mechanism between adjacent PEs in the two-dimensional array by using these circuits and configuration.
Some theories are known which enable the two-dimensional PE array to effectively assign various types of processings, including cellular automaton and cellular neural network (CNN) which are disclosed in Chua, L. O. et al., "Cellular Neural Networks: Theory", IEEE Trans. on Circuits and Systems, Vol. 35, No. 10, October 1988.
FIG. 28 is a block diagram showing a conventional two-dimensional PE array PEA11.
In the conventional two-dimensional PE array, PEs (processing elements) 202, each composed of integrated elements including a microprocessor or an arithmetic unit, are arrayed in two-dimension, X.times.Y, where X is the number of PEs in the vertical direction and Y is the number of PEs in the horizontal direction, and are connected through data transfer paths 203 provided in the two-dimensional fashion.
The conventional PE array, however, has a problem in that the number of the data transfer paths 203 sharply increases with the number of PEs, thereby increasing the amount of hardware of the two-dimensional PE array.
The amount of hardware is further increased by the fact that it is generally difficult to increase the integration size of the PEs 202 when they are two-dimensionally arrayed. Moreover, the data bit width at data input/output ports 201 increases with the number of the PEs 202, which makes difficult data exchange with outside.
In this case, although the external data exchange can be facilitated by adding a mechanism for compressing the data bit width at the data input/output ports 201, this will arise another problem of making it difficult for the PE array to have flexibility such as changing the number of PEs 202.
FIG. 29 is a block diagram showing another conventional two-dimensional PE array PEA12.
In the conventional two-dimensional PE array, X.times.Y PEs (processing elements) 212, each composed of integrated elements including a microprocessor or an arithmetic unit, are arrayed in one-dimension, and are connected through data transfer paths 213 as shown in FIG. 29. The PEs 212 are arrayed in a zigzag by every X PEs, so that a pseudo-two-dimensional PE array with X.times.Y PEs is implemented.
In this system, the one-dimensional data transfer paths 213 are employed for transferring data to adjacent PEs 212 in the two-dimensional array using the PEs 212 as an intermediary.
The conventional PE array, however, presents a problem in that the transfer time increases because the data must pass through X PEs to be transferred between horizontally adjacent PEs in FIG. 29, thereby resulting in an enormous increase in the total transfer time.
To shorten the transfer time between the horizontally adjacent PEs, dedicated paths can be provided. This, however, will presents another problem in that the amount of hardware will increase because of an increase in the number of the data transfer paths as in the foregoing example employing the two-dimensional data transfer paths 203.
FIG. 30 is a block diagram showing a conventional CAM (content addressable memory) M11.
The conventional CAM M11 comprises, as shown in FIG. 30, words 224.sub.(1)-(w), a mask register 222, an address decoder 225, and a hit-flag register 227 that can be utilized as a one-dimensional data transfer path between the words 224.sub.(1)-(w). The CAM M11 is disclosed in T. Ogura, et al., "A 20-kbit Associative Memory LSI for Artificial Intelligence Machines", IEEE J. Solid-State Circuits, Vol. 24, No. 4, pp. 1014-1020, August 1989.
The CAM M11 can read or write data from or to any word 224.sub.(1)-(w) by providing an address input port 223 with an address as in a common memory. In addition, it has a maskable search function, and a partial and parallel write function, and by using these functions, it can perform various logical and arithmetic operations simultaneously on the entire words. Accordingly, applying the CAM to a two-dimensional PE array can implement a highly parallel computer with an enormous number of PEs.
The hit-flag register 227 that can be utilized as a one-dimensional data transfer path between the words 224.sub.(1)-(w), however, has only a unidirectional shift mode in which only shift up or shift down of the data can be achieved. This presents a problem in that the data transfer between the words 224.sub.(1)-(w) can only be carried out effectively in specific one direction. In addition, since the conventional system does not have a mode which simultaneously performs both the data read write and the shift in the hit-flag register, it has a problem in that it cannot achieve the effective data transfer processing.
Therefore, the two-dimensional PE array structured by using the foregoing CAMs has a problem in that its data transfer time is rather long.
In the image processing algorithm, it is usually effective to adopt one to one correspondence between the pixels of an image and the two-dimensional PEs, in which case many PEs are needed such as 65,536 PEs for 256.times.256 pixels. Thus, a two-dimensional PE array is needed which can mount a great number of PEs. Accordingly, if the two-dimensional PE array is arranged with multiple boards, it will become very expensive. To avoid this, it is required that the two-dimensional PE array including many PEs be implemented with a hardware amount of about a single board.
In addition, since real time processing is usually required in the image processing, a two-dimensional PE array is required which can implement the real time processing in various types of image processing by suppressing not only the operation time in each PE, but also the data transfer time between the adjacent PEs in the two-dimensional array as much as possible.
Parallelism in the image processing, audio and speech processing and information processing takes diverse forms in accordance with the types of processings, and hence the PE arrangement required of the two-dimensional PE array is also diverse. In view of this, a highly flexible two-dimensional PE array is desired in which the PE arrangement can be changed freely.
A mathematical morphology processing is a theoretical system which provides a consistent method for transforming an object image constructed with the operation based on the set theory. It is widely used for feature extraction, shape representation, or shape recognition of binary and gray scale images. Details of the mathematical morphology processing is disclosed in P. Maragos, "Tutorial on advances in morphological image processing and analysis", Optical Engineering, Vol. 26, No. 7, 1987, for example. As a conventional mathematical morphology processor, is known one disclosed in M. Hassoun, et al., "A VLSI gray-scale morphology processor for real-time NDE image processing applications", SPIE, Vol. 1350, Image Algebra and Morphological Image Processing, 1990.
FIG. 31 is a block diagram showing a conventional mathematical morphology processor MS0.
The conventional mathematical morphology processor MS0 comprises a 5.times.5 PE array 83, an exclusive OR 81 and a comparator 82. It performs the mathematical morphology processing by carrying out arithmetic operation and comparing operation while scanning an original image with the PE array 83.
The conventional mathematical morphology processor MS0, however, has a problem in that it cannot handle a structuring element with a size greater than 5.times.5 which is the size of the PE array 83. In addition, since it takes a processing time in proportion to the size of an original image, it has a problem in that the mathematical morphology processing time increases with the size of the original image. Furthermore, it presents a problem in that its hardware amount increases because the number of PEs in the PE array 83 must be increased to handle a large structuring element, and the amount of wiring between adjacent PEs increases with the number of the PEs in the PE array 83.
To enable the mathematical morphology processing to be applied to various types image processings, it is necessary to perform the real time processing (at a video rate) of a large original image and a large structuring element, and such a mathematical morphology processor is required.
One of the features of the mathematical morphology is the very high parallelism resulting from the fact that an original image can be handled by operations only between the nearest neighbors. Accordingly, to implement a high performance mathematical morphology processor, it is necessary to take the maximum advantage of this feature and to implement the mathematical morphology processor including PEs having one-to-one correspondence with the pixels.
In this case, however, about 260,000 PEs are needed to process a 512.times.512 image, for example. Thus, a mathematical morphology processor is required that can mount an enormous number of PEs.
This also requires multiple boards, thereby sharply increasing its cost. Accordingly, it is preferable that the mathematical morphology processor including a large number of PEs be realized with a hardware amount of about a single board to suppress its cost.