The present invention relates to a semiconductor device, and particularly to a semiconductor device which includes a highly-integrated memory and multiple processing elements formed on a single chip and is suitable for data processing.
Semiconductor devices which include a processor and memory mounted on a circuit board and are used as a specialized processing system for implementing high-speed processing, such as image processing, of a vast amount of data are known in the prior art. This prior art in which the processor and memory are connected through buses, necessitates the operation of bus arbitration. Thus, when a series of read, compute and write operations is repeated for a vast amount of data, a significant amount of time is expended for the read and write operations and their switching operation, and data processing is inefficient.
An improved semiconductor device which includes multiple processing elements and a memory cell array mounted on a single chip, and operates to read out data of memory cells on a word line of the memory cell array and compute the data in parallel has been known in the prior art. A semiconductor device of this type is described for example in the publication: Y. Aimoto, et al. "Memory Array Circuits of Integrated Memory Array Processor (IMAP) LSI", Proceeding of the 1994 IEICE Spring Conference,5-261 C-693.
This prior art device includes 64 processing elements and SRAMs of 2 Mb integrated on a chip, and is designed to operate the processing elements in parallel in response to an instruction based on the SIMD (Single Instruction Stream Multiple Data Stream) scheme. Although the computation of image data processing is not very intricate, the same computation is repeated a great number of times for a vast amount of data.
When the above-mentioned semiconductor device having multiple processing elements and a memory cell array is used for image data processing, the operations of reading out data from memory cells, implementing certain computation for the data with the SIMD-based processing circuit and writing data of computation result to memory cells are repeated. The series of read, compute and write operations of data takes an amount of time which is the sum of the read time tr, computation time tc and write time tw, and an m-time repetition of this series of operations takes a total time of m(tr+tc+tw).
The computation time tc may be reduced in the future when the processing circuit is further sped up based on more advanced scaling achieved by the progress of semiconductor fabricating technology.
However, the above-mentioned prior art semiconductor device having multiple processing elements and a memory cell array integrated on a chip will encounter the difficulty of increasing the volume of signals from memory cells contrary to the enhanced scaling of the memory cell array as opposed to the speeding up of processing circuit. Therefore, the data read time tr and write time tw will not be reduced significantly. Accordingly, the speed of repetitive image data processing, in which data are read out of memory cells, data are computed and data of computation result are written back to the same memory cells, will be unfavorably dominated by the data read time tr and write time tw.