The present disclosure generally relates to topological sorting. The disclosed embodiments relate more specifically to a system, apparatus, method, and computer program product for performing a topological sort of a directed graph that comprises a cyclic component.
Projects that require certain tasks to be performed before others must be scheduled properly the minimize the consumption of resources and to prevent bottlenecks from occurring. When building a house, for example, the house generally must be framed before the external walls can be put in place. Accordingly, a general contractor must schedule carpenters to frame the house before masons arrive to brick up the walls so that the masons can immediately begin to work on bricking up the walls as soon as they arrive. Otherwise, the masons will have to sit by idly until the house is sufficiently framed. The same problem may arise if the general contractor has not ordered bricks and mortar by the time the masons arrive to brick up the walls of the house. Thus, it is important to schedule both tasks and resources to prevent to ensure that projects are completed in an efficient and effective manner.
A similar analogy may be applied to projects that rely on information technology (IT) to complete certain tasks. For example, poor scheduling can leave an expensive machine sitting idle while a task upon which it depends is being performed by another machine. To avoid such processing bottlenecks, such parallel processing systems may utilize an algorithm to map different tasks to different processors and to schedule the order in which those tasks are performed by those processors. Those algorithms generally perform a topological sort of those tasks based on their respective dependencies. The results of such a sort may be represented graphically by a directed graph.
A directed graph, or digraph, comprises a set of nodes, or vertices, that are connected by directional lines, or edges. Such a graph is considered “directed,” rather than “undirected,” because the edges have a direction associated with them that specifies the dependency of one vertex on another. For example, a first vertex that depends on a second vertex may be graphically represented in a directed graph by an edge pointing from the first vertex to the second vertex. And for the purpose of scheduling tasks, each such vertex may correspond to a particular task. For example, the first vertex may correspond to a first task that cannot begin until a second task, represented by the second vertex, is completed.
Directed graphs may in the form of a “tree” that comprises at least one “root” and at least one “leaf” A root is a vertex that has no incoming edges and every other vertex in the tree may be reached by a unique path starting at the root. By contrast, a leaf is a vertex that has no outgoing edges. In other words, a root is a vertex depends on not other vertex while all other vertices depend on the root, and a leaf is a vertex upon which no other vertex depends but that depends upon at least one other vertex. Nevertheless, a directed graph need not have a root node and there may be several or no paths from any one vertex to another.
Regardless of whether it is in the form of a tree or not, a directed graph may either be acyclic or cyclic. An acyclic directed graph comprises a plurality of vertices that are connected to each other in a sequence in which no vertex depends upon any of the vertices that depend upon that vertex, either directly or indirectly. For example, a first vertex V1 may depend upon a second vertex V2 and a third vertex V3, and the third vertex V3 may depend upon a fourth vertex V4 (e.g., V1V2, V1V3, V3V4). By contrast, a cyclic directed graph comprises a plurality of vertices that are connected to each other in a sequence in which at least one vertex depends upon one or more other vertices that depend on upon that vertex, either directly or indirectly. For example, a first vertex V1 may depend upon a second vertex V2, the second vertex V2 may depend upon a third vertex V3, and third vertex V3 may depend upon the first vertex V1 (e.g., V1V2, V2V3, V3V1).
Both acyclic and acyclic directed graphs may comprise one or more connected component. A connected component comprises a plurality of vertices that are connected to each other by edges but that are not connected to one or more other vertices in the graph. For example, a directed graph may comprise a first component that comprises a first vertex V1, a second vertex V2, a third vertex V3, and a fourth vertex V3 and a second component that comprises a fifth vertex V5 and a sixth vertex V6. Such connected components also may comprise “strong” subcomponents in which there is a path from each vertex in the subcomponent to every other vertex in that subcomponent. For example, the foregoing first component may comprise a strong subcomponent in which the second vertex V2 depends upon the third vertex V3, the third vertex V3 depends upon the fourth vertex V4, and fourth vertex V4 depends upon the second vertex V2 (e.g., V2V3, V3V4, V4V2). A strong subcomponent also may comprise a single vertex that is connected only to strong components. However, when a directed graph comprises even a single component with a cyclical, strong sub-component, conventional algorithms for performing a topological sort of that directed graph will fail.