When a hard ware device for executing a stream data processing such as an image processing is mounted, it is required not only to have excellent processing performance but also to have flexibility for handling various algorisms. To satisfy these requirements, a multi-core structure having multiple processing elements (i.e., PEs) is proposed.
For example, in non-patent literature No. 1, the PEs are connected to each other via a ring bus for writing only, and a memory corresponding to each PE provides a dual-port memory. Thus, a memory access operation from one PE and an access operation from other PE via the ring bus avoid a collision therebetween. An access operation for accessing a memory corresponding to the other memory via the ring bus is set to be an operation for writing only. Thus, a circuit construction is simplified.
In patent literature No. 1, each core processor includes multiple PEs, and the core processor corresponds to the PE in non-patent literature No. 1. Each core processor includes a frame memory as a common memory and multiple parallel memories having a memory capacity smaller than the frame memory therein. Thus, since multiple memories are prepared, a memory band, which decides a performance of the stream data processing operation, is improved.
In the stream data processing operation, as described in patent literature No. 1 and non-patent literature No. 1, a processing unit in each PE is assigned, and the data is transferred from one processing unit to another processing unit sequentially, so that a pipeline processing operation is performed. Here, in non-patent literature No. 1, in order to improve the processing performance, each PE has a multi-core structure. In this case, an access band of a dual port memory may be in short. Further, when the PEs are connected to each other via the ring bus, the designing degree of freedom for the pipeline structure may be restricted. Thus, when the data is transferred from one PE to another PE other than an adjacent PE, the access band for a whole of the ring bus may be restricted.
In patent literature No. 1, since the core processors are connected to each other via the dedicated bus, the designing degree of freedom for the pipeline structure is comparatively high. However, the access band is small, and the scalability is also small. Further, when multiple memories are arranged in the core processor, the structure of the hard ware device may be complicated, and the circuit dimensions increase. Furthermore, the operation speed is reduced, and the designing degree of difficulty in the software increases.