The present invention relates to a computing control device, a computing control method, and a computer readable medium.
The international standards of an Application Programming Interface (API) for image recognition, which is currently under preparation by The Khronos Group (hereinafter it will be referred to as “Khronos”), include OpenVX. In OpenVX, a graph manager operating on a computing control device, which is a target device, interprets a user application described in a graph form and performs efficient processing using an operator (accelerator) of the computing control device (Khronos Group,“OpenVX”, [online], [searched on Mar. 26, 2015], Internet <see URL: https://www.khronos.org/openvx/>).
FIG. 11 is a diagram showing an example of an OpenVX code and a graph created from this code. The graph is shown in the frameon the upper right side of FIG. 11.
In the OpenVX code, first, a graph structure is defined by a vx**Node( ) function group (e.g., processing function xvThresholdNode (graph, in, thres, thout) etc.). Further, the graph structure is analyzed by a vxVerifyGraph( ) function to determine parallelism of processing and the order of the processing expressed in the graph. The processing related to the graph is then executed by a vxProcessGraph( ) function.
In the graph shown in FIG. 11, binarization processing is performed on an input image “in” at the vxThreshold node, subtraction and addition are performed at the vxSubtract node and the first vxAdd node, respectively, and the results of the operations are added at the second vxAdd node. While the vxThreshold node needs to be processed first in this example, either one of the vxSubtract node and the first vxAdd node may be processed first or they may be processed in parallel.
The term “graph” here is a directed graph (Directed Acyclic Graph).
In nodes (Base Nodes) used on the graph, in order to maintain the compatibility in the OpenVX code, the specifications of the node such as their required accuracy or behavior is strictly defined by Khronos. For example, a vxPhase node that calculates an edge direction for each pixel is defined to output with the accuracy of 8 bits from 0 to 255. If the edge directions to be calculated are about eight or nine directions, processing with a relatively light load such as If-Then-Else with the accuracy of about 3 or 4 bits (about 20 cycles) is generally sufficient. However, processing such as arctan in which the computation amount is large (about 1150 cycles) is required for processing with the accuracy of 8 bits defined by Khronos, which increases the process time.
FIG. 12 is a diagram showing another example of the OpenVX code and a graph created from this code. In FIG. 12 as well, the graph is shown in the frame on the upper right side of FIG. 12.
In this example, the vxPhase node calculates the edge direction by inputting outputs sobelx and sobely of a vxSobel3x3 node that calculates edge components in the X direction and the Y direction. Then a vxHistogram node is called by inputting the edge direction phase that has been calculated.
For example, in the histogram calculation processing (vxHistogram node) of a Histogram Of Gradient (HOG) application, which is one image recognition application, the accuracy of 3 or 4 bits with about 8 or 9 directions is generally sufficient.
Therefore, when a user connects the vxHistogram node as the node subsequent to the vxPhase node to implement the HOG application, while the accuracy of about 3 bits is sufficient in the vxHistogram node, the operation of the accuracy of 8 bits is performed in the vxPhase node. Therefore, compared to the case in which the edge direction calculation node with the accuracy of 3 bits is used, process time several times longer is required to obtain the same processing results.
Further, in the typical C language or C++ language, it is difficult to express the parallelism of processing. For example, when there are functions B and C that use the output of a function A, it is possible to process the functions B and C in parallel unless there is a dependence relationship between the functions B and C. However, when the program is sequentially described in the typical C language, this parallelism does not explicitly appear.
FIG. 13 is a diagram showing an example of the program described in the C language.
For example, when the program is written as shown in the left side of FIG. 13, the function C is executed after the function B is executed and when the program is written as shown in the right side of FIG. 13, the function B is executed after the function C is executed. That is, whether the function B and the function C can be executed in parallel is not expressed.
Therefore, in a multi-core processor including a plurality of computing resources, a user needs to explicitly describe a fork or a join of the function. However, since the allocation of the optimal function may vary for each target device, it is difficult to describe the program optimal for all the devices while keeping the code compatibility.
In the OpenVX, the graph structure is defined on the C language program and the graph manager included in a device in compliance with OpenVX is able to interpret the above program, analyze the graph structure defined on the program, and extract the parallelism. Therefore, even when the OpenVX code is described by the user without knowing the details of the target device, the optimal function can be allocated to the target device via the graph manager.