The present invention relates to increasing processing efficiency, and more specifically, to optimization and implementation of graph analytics applications as based on providing a graph analytics run time that provides a platform that for automatically determining an optimal implementation, for a graph API (Application Program Interface) operator called from a developer's graph analytics application by comparing operating time costs for various alternative processing scenarios including such differences as different graph representation formats, operators, and machine configurations.
Graph analytics is an important component of cognitive computing. For example, much of big data information, a subject commanding great attention these days, is graph structured. The analysis requires large graphs to be sub-graphed, analogous to select and project operations in SQL (Structured Query Language), and then to be analyzed for various properties.
For example, as shown in FIG. 1, filters 101 can be used to construct graphs related to incoming or stored data. There might be, for example, incoming data to that is stored as a graph 102 that includes a node identifying a specific individual or event and other nodes store additional incoming data related to that individual or event. When this data is analyzed using a graph analytics program 103 the graphs are iteratively broken down 104 into subgraphs and other objects of interest. However, since the graph data is typically stored in memory as graph nodes linked to other nodes in a random order, processing of the graph data in the graph analytics program can be quite lengthy because of the time needed to access the graph data stored in such linked random access order.
The efficiency of processing graph data depends on not only efficiency of the software operators used in the processing, but also computer architecture features as cache sizes, etc. Since the graph analysis requires sizable computational resources, often graph analysis based application developers are faced with the task of optimizing their application and selecting the hardware system to execute the applications efficiently. Potential systems one might consider include, for example, large SMPs (Symmetric MultiProcessors) that are multi-core/multi-threaded general purpose processors based systems, distributed memory systems, and accelerated systems where the CPUs (Central Processing Units) are augmented with GPUs (Graphics Processor Units) and FPGAs (Field Programmable Gate Arrays). Traditionally, to achieve good performance, each of these systems requires the basic operators of the graph algorithm to be coded differently.
Thus, as shown exemplarily in FIG. 1, an applications developer attempting to utilize graph analytics processing 105 would have to consider application metadata 106, representation format 107 of the graph data, and details of operator execution 108. Additionally, the developer would have to consider system metadata 109 for the computing system 110 that is intended to be used to perform the graph analytics program 103.
Developers often have to rewrite programs for each system and for each anticipated metadata of the application metadata 106 and the system metadata 109 to include system specific optimization and graph characteristics. These characteristics of non-compatibility make it difficult for application developers to optimize graph analytics program processing for all dynamic situations, and there is no portability or reuse of code.
Thus, two key challenges faced by developers of large graph analytics are: 1) programming for optimum performance, which requires significant effort; and 2) portability, while maintaining performance, as the chosen system evolves over time or if the user wishes to move from one type of system to another. These challenges arise because attaining optimum performance requires detailed knowledge of the design of the processors and the systems they comprise, including their cache/memory hierarchy. This knowledge is needed to adapt the analytics algorithms to the underlying system, in particular to take advantage of the parallelism or concurrency at the chip, node or system level. Exploiting concurrency/parallelism so far has been a skilled and non-automated task.
The performance and portability challenges are not easily addressable by compilers because compilers do not have ability to examine large instruction windows, and in the conventional representation of the graph algorithms, the control flow is often data dependent. Particularly, such factors as the number of nodes and edges in the graph being analyzed, the sparsity of the graph data, and whether or not the sparse entries have a regular pattern are not known to the compiler.
In recognizing that developers implementing graphic analysis are often faced with the task of not only selecting a hardware system to execute their applications optimally, but also the task of reworking code to accommodate each graph analysis scenario, the present inventors have also recognized that there is a need for a mechanism that provides good performance in graph analysis processing while permitting the application to be portable across platforms.