There is an increasing demand for high-level recognition and matching processing, such as image recognition by a surveillance camera and biometric authentication using fingerprint or iris. The processing matches a large number of previously registered data to to-be-matched data given as input to find registered data closest (having a highest likelihood of matching) to the to-be-matched data. More advanced and larger-scale systems are expected to not only reduce time for processing one to-be-matched data but also improve throughput for processing a large number of to-be-matched data. From the viewpoint of responding to improvement in recognition algorithm and maintaining flexibility in system architecture, it seems favorable that such large-scale matching systems are achieved by software on general-purpose processors.
On the other hand, from the point of view of semiconductor devices, improvement of processor operation clock has recently being slowing down, and performance is increasingly being improved through parallel processing using plural or multiple processor cores. For example, some CPUs (Central Processing Unit) for general-purpose processing devices, such as personal computers and servers have a multicore configuration with about two to eight cores. In addition, regarding GPUs (Graphical Processing Units) for image processing and scientific computing, there are those that have a many-core configuration with several hundreds of simple cores.
The large-scale matching systems require not only matching processing but also versatile processing such as registration data management and input/output control. Accordingly, favorably, the large-scale matching systems are constructed by a combination of a general-purpose host processor and a matching-directed many-core coprocessor. This requires a parallel processing technique for performing high-level and large-scale matching processing by appropriate sharing between a host processor and a coprocessor.
The following is parallel processing techniques associated with matching and recognition.
Patent Literature 1 discloses a technique in which, in matching processing of a three-dimensional object, a data region is divided so that an amount of data to be processed per thread is equal and is below a predetermined amount, and data is input to a GPU to cause the GPU to perform parallel processing.
Patent Literature 2 discloses a technique in which, in pattern recognition processing, matching with dictionary data is performed in parallel by a plurality of processor cores in the same number as dictionary patterns.
Patent Literature 3 discloses an information recording device that compares video/audio contents by using units of chapter data forming the contents to detect predetermined chapter data. The information recording device of Patent Literature 3 is not particularly considered for application of any parallel processing technique.
Patent Literature 4 discloses a data processing device that performs pattern matching by calculation similarities of an input pattern and a template pattern. The data processing device of Patent Literature 4 is serially inputted input data bit by bit in calculating the similarities. The data processing device thereof does not perform calculation using the parallel processing technique.
Patent Literature 5 discloses a display control system that prefetches image data expected to be read-accessed in the future to store it into cache memory.
An OpenCL (Open Computing Language; registered trademark) technique described in Non Patent Literature 1 is a general-purpose technique for using a coprocessor (typically, a GPU) from a hots processor. Use of the OpenCL technique allows programming less dependent on a specific coprocessor product. Specifically, a user of the OpenCL technique determines, in addition to a central algorithm for processing an execution target (hereinafter referred to as “target processing”), a method for dividing the target processing into pieces of unit processing that can be performed in parallel (hereinafter referred to as “parallel division”). Then, the user issues an instruction of communication between the host processor and the coprocessor according to the determined method, based on a format of the OpenCL. The above-described operations by the user allows achievement of a parallel processing system using the coprocessor.