A signal processing apparatus which employs a neural network is widely used in pattern recognition apparatus, prediction system, and control apparatus. Generally, the neural network is realized by software which runs on a microprocessor and is provided to a personal computer or a work station as application software. On the other hand, there is a technique which realizes the neural network by analog hardware or digital hardware to be applied to a high-speed processing apparatus which processes large scale data such as image data. For example, Japanese Patent Application Laid-Open No. 2-236659 discusses a technique which realizes a general multi-layer perceptron neural network by digital hardware.
Among the neural networks, an operation method referred to as a convolutional neural network (hereinafter referred to as CNN) realizes pattern recognition which is robust against changes in recognition targets. Japanese Patent Application Laid-Open No. 10-021406 discusses a technique for performing face recognition using image data by applying such a method.
An example of the CNN operation will be described below. FIG. 13 illustrates a network configuration of an example of the CNN operation. Referring to FIG. 13, an input layer 301 is the image data of a predetermined size which is raster-scanned when performing the CNN operation on the image data. Feature planes 303a, 303b, and 303c are feature planes of a first layer 308. The feature plane is a data plane which indicates a detection result of a predetermined feature extraction filter (which performs the convolution operation and non-linear processing). For example, when face detection is performed, the feature plane is the data plane indicating the detection results of eyes, a mouth, and a nose.
Since the feature plane is the detection result of the raster-scanned image data, the detection result is also expressed as a plane. The feature planes 303a, 303b, and 303c are generated by performing the convolution operation and the non-linear processing on the input layer 301. For example, the feature plane 303a is acquired by performing the convolution filtering operation on a filter kernel 3021a and performing the non-linear transformation on the operation result. Further, filter kernels 3021b and 3021c are each used to generate the feature planes 303b and 303c respectively.
FIG. 10 illustrates an example of the convolution filter. Referring to FIG. 10, data 41 indicates a raster-scanned reference pixel, and a filter kernel 42 is an example of the filter kernel with respect to the reference pixel. The example illustrated in FIG. 10 is equivalent to performing a finite impulse response (FIR) filter operation in which the kernel size is 11 by 11. The FIR filter is processed by a product-sum operation illustrated below.
      output    ⁡          (              x        ,        y            )        =            ∑              row        =        0            rowSize        ⁢                  ∑                  column          =          0                columnSize            ⁢                        input          ⁡                      (                                          x                +                column                            ,                              y                +                row                                      )                          ×                  weight          ⁡                      (                          column              ,              row                        )                              
In the above equation, “input (x, y)” indicates a reference pixel value in an x-y coordinate and “output (x, y)” indicates the FIR filter operation result in the x-y coordinate. Further, “weight (column, row)” indicates an FIR filter coefficient in a coordinate (x+column, y+row), and “columnSize” and “rowSize” indicate a filter kernel size.
When the feature plane 303a illustrated in FIG. 13 is to be calculated, the data matrix 41 corresponds to the input layer 301, and the filter kernel 42 corresponds to the filter kernel 3021a. In the CNN operation, the product-sum operation is repeated while the filter kernel is scanned pixel by pixel, and the feature plane is generated by performing the non-linear conversion on the final product-sum result. Further, since the number of connections with the previous layer is one, one filter kernel is used in calculating the feature plane 303a. 
The operation for generating a feature plane 305a of a second layer 309 illustrated in FIG. 13 will be described below. FIG. 15 illustrates the operation for generating the feature plane 305a. Referring to FIG. 13, the feature plane 305a is connected with the feature planes 303a, 303b, and 303c of the previous first layer 308. A filter operation is thus performed on the feature plane 303a to calculate the data of the feature plane 305a by using the kernel which is schematically illustrated as the filter kernel 3041a. The result is then stored in a cumulative adder 501.
The filter operations using the filter kernels 3042a and 3043a are similarly performed on the feature planes 303b and 303c respectively, and the results are accumulated in the cumulative adder 501. After the three types of the filter operations are completed, a non-linear conversion 502 is performed using a logistic function or a hyperbolic arctangent function (tan h function). The feature plane 305a is thus generated as a result of performing the above-described process by scanning the entire image pixel by pixel.
Similarly, three convolution filter operations are performed on the feature planes 303a, 303b, and 303c of the previous first layer 308 using the filter kernels 3041b, 3042b, and 3043b respectively to generate the feature plane 305b. Further, two convolution filter operations are performed on the feature planes 305a and 305b of the previous second layer 309 using the filter kernels 3061 and 3062 to generate a feature plane 307 of a third layer 310.
Each filter coefficient is determined in advance by learning using a general method such as perceptron learning or back propagation learning. A large size filter kernel whose size is 10 by 10 or larger is often used in the object detection and recognition.
As described above, since a plurality of filters having a large kernel size is hierarchically used in the CNN operation, it is necessary to perform a great number of convolution operations. Therefore, if the CNN operation is to be implemented by software, expensive high-end processors become necessary.
Further, if the CNN operation is to be implemented by hardware, an apparatus of sufficient performance cannot be realized by a serial process circuit formed of one operation unit as discussed in Japanese Patent Application Laid-Open No. 2-236659. Japanese Patent Application Laid-Open No. 2-236659 also discusses a method for realizing high-speed processing by combining a plurality of serial processing circuits. However, it is difficult to realize high-performance hardware which adapts to arbitrary networks by employing the same circuit. Further, Japanese Patent Application Laid-Open No. 2-236659 discusses a configuration of a plurality of product-sum operation units. Since different weight coefficients are applied to each of the product-sum operation units which operate concurrently, the circuit size increases when implementing high speed convolution operation such as the CNN operation using a plurality of large-size kernels.
Furthermore, Japanese Patent Application Laid-Open No. 2004-128975 discusses an image processing apparatus which performs a high-speed parallel convolution operation by setting common weight coefficients to the product-sum operation units and extracting the input data in parallel while shifting the input data. The circuit included in the image processing apparatus uses a multiport memory in which the number of ports is equivalent to that of computing units. Therefore, if the image processing apparatus is applied to a general single port memory system, the inputting of the data may become a bottle neck, and performance which is appropriate to the degree of parallelism of the computing units cannot be achieved.
Moreover, a plurality of weight coefficients of a large filter kernel size may be employed in the CNN operation, and the process may be performed by selecting the plurality of weight coefficients for each product-sum operation. In such a case, the setting of the weight coefficients may become a bottle neck, and performance which is appropriate to the degree of parallelism cannot be acquired.
The above-described conventional techniques are originally directed at realizing a general multi-layer perceptron neural network or a general FIR filter. Therefore, it is difficult to perform a complex and hierarchical convolution operation such as the CNN operation by a simple and flexible configuration.