The invention relates to a parallel operation device and a microcomputer used for calculation of a histogram, and relates to technology which can be effectively applied to a microcomputer such as, for example, an image processor, a digital signal processor, and an audio processor.
In data processing which handles a large amount of data such as image processing or audio processing, a plurality of processors is in many cases operated in parallel to improve the efficiency of data processing. However, there have been some types of data processing that cannot sufficiently take advantage of the parallelism of processors, for example, calculation for generating a histogram. A histogram indicates the frequency distribution (frequencies of appearance) of data and is used very often in image data processing, for example, which begins with acquiring a histogram of the entire image and uses the histogram to describe local features of an image.
As a technique for high speed calculation of a histogram, Patent Document 1 (Japanese Patent Laid-Open No. 1986-153771) describes an apparatus which acquires a histogram from data input in a single system. Patent Document 2 (Japanese Patent Laid-Open No. 1989-166174) also describes an apparatus which acquires a histogram from data input in a single system as with Patent Document 1.
Patent Document 3 (Japanese Patent Laid-Open No. 2002-109535) discloses a circuit which calculates a histogram that does not easily depend on the number of pixels of input image data by allowing a larger numerical expression with a memory means having a small word length.
Patent Document 4 (Japanese Patent Laid-Open No. 1998-105702) describes a histogram acquisition apparatus which omits the points where the accumulation value of the histogram is zero. According to the document, the memory area for storing histogram values with zero histogram frequency becomes needless, and thus the number of memory areas (also simply referred to as bins) for storing histogram values can be reduced, which in turn contributes to mitigation of transfer process or shortening of transfer time of frequency data in a gradation frequency memory forming a plurality of bins.
According to Japanese Patent Laid-Open No. 1988-98078 (Patent Document 5), each processor is provided with sub-histograms having the same capacity as a histogram desired to be finally acquired, and sub-histograms are calculated for each processor. After the calculation, the sub-histograms are added for each bin to acquire the desired histogram.
Non-Patent Document 1 (“Histogram calculation in CUDA, URL:http://developer.download.nvidia.com/compute/cuda/1—1/Website/projects/histogram256/doc/histogram.pdf.”) illustrates a configuration of generating a histogram by a multiprocessor system allowing a plurality of processors to access bins in the same histogram. In this case, updating of bins by the processor is processed in an ordered manner.
The inventors have considered calculating histograms with high speed by inputting and processing in parallel data which has been processed in parallel by a plurality of processors.
However, none of patent documents 1 to 4 can handle the process of generating a histogram for parallel-input data.
In the case of Patent Document 5, although it can perform the process of generating a histogram for parallel-input data, a memory area for sub-histograms must be prepared for each processor. Letting N be the number of processors and M the number of the bins, and assuming that each bin requires 32 bits to store the maximum frequency of occurrences, a memory capacity of N×M×32 bits is required in the memory area for sub-histograms. In addition, data of sub-histograms accumulated in the memory area for each sub-histogram must be added, and thus the addition process may cause the total processing time to increase.
In the case of Non-Patent Document 1, the processing time differs for patterns of input data. For example, when acquiring a histogram of brightness values of an image, the worst case of processing time is where the brightness values of the image are all the same. Assuming that each processor can update the frequency of the histogram for one bin in a single clock, with the number of processors in this occasion being N, each processor must necessarily wait for N clocks until its turn of performing the update process of the bin comes around, which takes a long processing time.
The present invention has been made in view of the above circumstances and provides a parallel operation device and a microcomputer which can handle parallel-input data to generate frequency data of a histogram, with the processing time for generating frequency data of the histogram not depending on the distribution of histogram values in its input data, and can further reduce the memory area used for accumulating frequency data of the histogram.
The other purposes and the new feature of the present invention will become clear from the description of the present specification and the accompanying drawings.