Directed graphs are used for modeling and solving a number of problems in computer software. For example, compilers often create call graphs where each node in a directed graph represents a function, and each connection represents a path through which one function calls another. Directed graphs (and in particular directed acyclic graphs) are often used for Bayesian networks for making decisions under uncertain conditions using a variety of known probabilities.
Strongly connected components (SCCs) in a directed graph are those sub-graphs that are maximally strongly connected. A graph or sub-graph is strongly connected if there is a path from each node in the graph to each other node. The paths flow in both directions, meaning that for two nodes to be part of a strongly connected graph, there has to be a path to get between the two nodes in each direction. Identifying strongly connected components in a directed graph is useful in many graph-based data structures, and several efficient algorithms have been developed, such as Tarjan's Algorithm. Tarjan's Algorithm performs a depth-first search from a start node. Nodes are placed on a stack as they are encountered. Each time the search returns from a sub-tree, a test is performed on the sub-tree node to determine whether the nodes below the sub-tree node form a strongly connected graph. If so, then those nodes are removed from the tree and identified as a strongly connected component.
Finding the entries and exits of strongly connected components is valuable in a number of applications. For example, a compiler may produce different code if a developer adds a call from one function to another that introduces recursion in a software application. The newly introduced edge in a directed graph holding the functions of the software application produces a different traversal order when identifying recursion edges and may cause a previously non-recursive edge to be identified as a recursive edge. Many compilers do not inline functions across a recursive edge, so what seemed to the developer to be a small change might reduce inlining and dramatically impact the size and execution time of the binary code for the software application. A node is an entry if there is an edge from some node outside of a strongly connected component to the original node; and a node is an exit if there is an edge from the node to some node outside of the strongly connected component. Using current technology, identifying entries and exits is inefficient and typically involves a second search through the tree after the strongly connected components have been identified. When used in a compiler on software code with many functions, this can dramatically increase compile time and resource usage.