This invention relates to a parallel computer having a plurality of processing elements and more particularly to a parallel computer suitable for performing the image processing operation.
Conventionally, as the construction of the parallel computer, a local memory type and shared memory type are known.
In the parallel computer of local memory type, a local memory is provided for each processing element. Therefore, each of the processing elements can make access to the local memory which belongs thereto independently from the other processing elements. However, the parallel computer has a disadvantage that each processing element cannot make direct access to the local memory which belongs to the other processing element.
In the parallel computer of shared memory type, all of the processing elements share a memory. Therefore, each of the processing elements can make direct access to the shared memory. However, the parallel computer has a disadvantage that memory access contention between a plurality of processing elements occurs and the parallel operation will be easily disturbed.
As another type of parallel computer for solving the above problems, there is provided a parallel computer having a local cache memory for each processing element and a main memory shared by all of the processing elements. Further, as s till another type of parallel computer, there is provided a parallel computer which shares information between processing elements by use of crossbar switches. However, the above types of parallel computers are complicated in construction, the hardware amount increases and the control operation becomes difficult.
As one of the application fields of the parallel computer, an image processing operation is provided.
For example, in a case where the image processing operation is effected by use of a parallel computer having processing elements connected in a matrix form, a method for assigning portions of an image to the respective processing elements and causing the processing elements to process the partial images assigned thereto in parallel, thereby enhancing the speed of the image processing operation is considered. In the case of application to the image processing operation, since most memory accesses are localized to relatively nearby memory areas, it is considered effective to utilize the parallel computation in order to attain th e high processing speed.
However, in order to enhance the image processing speed, the conventional parallel computer of local memory type is insufficient. The reason is that it is necessary for each processing element to use the partial image assigned to the adjacent processing element in the computation for the end portion (boundary) of the partial image assigned to itself when the image processing such as the filtering process is effected, for example. That is, since access from a processing element to a memory which belongs to the adjacent processing element is made by use of communication between the elements via the adjacent processing element, a problem that the access speed becomes low occurs.
Also, in the shared memory type parallel computer is insufficient. This is because memory accesses simultaneously occur to cause memory access contention, the parallel operation cannot be effectively performed, and the practically high operation speed cannot be attained.
Further, the parallel computer using the cache memory is not effective since the image data size is large and the hit ratio is low. In addition, the parallel computer using the crossbar switch is not effective since the hardware becomes excessively complicated.
As described above, in the conventional parallel computer, it takes a long time for memory access in the local memory type and memory access contention occurs and a satisfactory parallel operation cannot be effected in the shared memory type. Further, in the parallel computer using the cache memory or crossbar switch, a problem occurs in the hardware amount and control operation.
This invention has been made in order to solve the above problems and an object of this invention is to provide a parallel computer capable of effecting the parallel processing operation at higher speed and making efficient memory access without increasing the hardware amount and making the control operation complicated.
That is, in order to attain the above object, a parallel computer according to a first aspect of this invention comprises a plurality of memory elements which are logically arranged in a first arrangement pattern and store data; a plurality of processing elements which are logically arranged in a second arrangement pattern corresponding to the first arrangement pattern and process the data of the memory elements; and a connecting system which logically connects each of the processing elements to associated memory elements included among the memory elements.
Preferably, the processing elements are logically arranged in a matrix as the first arrangement pattern and the memory elements are logically arranged in a matrix as the second arrangement pattern, and the connecting system includes a connection section which connects each of the processing elements to memory elements included in the memory elements and logically arranged around the each of the processing elements.
Preferably, the processing elements and the memory elements are equal in number to each other and are alternately arranged to form a logical matrix array pattern, and the connecting system includes a connection section which connects peripheral processing elements included among the processing elements and logically arranged in a periphery of the matrix array pattern to associated peripheral memory elements included in the memory elements to form a logical closed loop in an array of the processing elements and the memory elements.
Preferably, each of the processing elements has a function capable of performing a direct access only to the associated memory elements which are connected by the connecting system.
Preferably, each of the memory elements has a function capable of being directly accessed only by those of the processing elements that are connected thereto by the connecting system.
Preferably, the connecting system includes a plurality of processing element buses respectively and exclusively provided for the processing elements, a plurality of memory element buses respectively and exclusively provided for the processing elements, and a plurality of switching elements connected between each of the processing element buses and associated element buses included among the memory element buses.
Preferably, only one of the switching elements is selectively made conductive.
Preferably, the memory elements are connected in a matrix via the processing elements to construct an image frame buffer.
Preferably, the parallel computer further comprises an access system which accesses each of the memory elements from an exterior.
Preferably, the parallel computer further comprises local memories respectively and exclusively provided for the processing elements.
A parallel computer according to a second aspect of this invention comprises nxc3x97m (n and m are integral numbers) processing elements which process data; nxc3x97m processing element buses respectively provided for the processing elements; (n+1)xc3x97(m+1) memory element buses respectively provided for (n+1)xc3x97(m+1) memory elements to be accessed; and a plurality of switching elements which connect one of the processing element buses which is connected to one of the processing elements which corresponds to a logical position (i, j) (i is an integral number from 0 to (nxe2x88x921) and j is an integral number from 0 to (mxe2x88x921)) to the memory element buses connected to those of the memory elements which correspond to a plurality of logical positions (i, j), (i, j+1), (i+1, j and (i+1, j+1).
Preferably, in the parallel computer the processing elements are equal to (n+1)xc3x97(m+1) in number, and the parallel computer further comprises other switching elements for connecting predetermined processing elements included in the processing element buses and associated memory buses included in the memory element buses to form a logical closed loop of the processing element buses and the memory element buses.
Preferably, each of the processing elements has a section which directly accesses only to the memory elements which are connected thereto by the switching elements.
Preferably, only one of the switching elements is selectively made conductive.
Preferably, an image memory is constructed by the (n+1)xc3x97(m+1) memory elements.
Preferably, the parallel computer further comprises other switching elements for connecting the memory elements to an exterior.
Preferably, the parallel computer further comprises inherent local memories respectively provided for the processing elements and other switching elements which connect the processing element buses and the inherent local memories.
A parallel computer for image processing according to a third aspect of this invention comprises nxc3x97m (n and m are integral numbers) processing elements which perform an image processing in a distributed and cooperative manner; an image memory having (n+1)xc3x97(m+1) memory elements which store partial image data, respectively, to store image data; a plurality of processing element buses respectively and independently provided for the processing elements; a plurality of memory element buses respectively and independently provided for the memory elements; a plurality of internal switching elements which selectively connect one of the processing element buses which is connected to that of the processing elements which corresponds to a logical position (i, j) (i is an integral number from 0 to (nxe2x88x921) and j is an integral number from 0 to (mxe2x88x921)) to the memory element buses connected to those of the memory elements which correspond to a plurality of logical positions (i, j), (i, j+1), (i+1, j) and (i+1, j+1); and a plurality of external switching elements respectively connected to the memory element buses, for inputting/outputting partial image data between an external device and the memory elements.
Preferably, in the parallel computer the processing elements is equal to (n+1)xc3x97(m+1) in number, and the parallel computer further comprises other internal switching elements for connecting predetermined processing elements included in the processing element buses and associated memory buses included in the memory element buses to form a logical closed loop of the processing element buses and the memory element buses.
Preferably, each of the processing elements has a function for effecting the image processing in the distributed and cooperative manner based on partial images stored only in associated ones of the memory elements which are directly accessed by the each of the processing elements via corresponding ones of the internal switching elements.
Preferably, the parallel computer further comprises a global processor which accesses the image data stored in the image memory via the external switching elements and calculating the global feature based on the image data.
According to this invention, a plurality of processing elements locally share a plurality of memory elements so that efficient memory access can be made and the parallel processing operation can be effected at higher speed without increasing the hardware amount and making the control operation complicated.
Further, according to this invention, when it is considered to apply this invention to image processing, each of the processing elements can make access to an area which is close to the processing element and is half the storage area of partial image data managed by the adjacent processing element.
In addition, according to this invention, it is possible to realize a parallel computer in a relatively simple hardware construction suitable for the characteristic of image processing or the like.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.