Path analysis is a process of analyzing the formation of a path by a sequence of nodes. Path analysis is employed in various fields such as, but not limited to, internet website traffic analytics, protein sequencing, virus and malicious software detection algorithms, text analysis algorithms, and so forth. A path analysis system may accept a sequence or combination of nodes and identify paths that include the sequence or combination of nodes. For example, an internet website traffic analysis system may accept the source and destination web pages as input, and identify the various navigation patterns followed by visitors of the website to navigate from the source web page to the destination web page.
Path analysis may be characterized by analyzing large amounts of logged data. Such logged data includes, without limitation, web page access requests, protein molecules, virus signatures, linguistic constructs, and so forth. Indexing of the logged data may not be feasible, nor possible, due to the large amount of the logged data. Therefore, current path analysis systems may process the entire amount of logged data to identify pertinent paths based on the input node sequences or node combinations. This may require a significant amount of processing power. Path analysis, in Internet website analytics, is a process of determining a sequence of pages visited in a visitor session prior to some desired event, such as the visitor purchasing an item or requesting a newsletter. The precise order of pages visited may or may not be important and may or may not be specified. In practice, this analysis is done in aggregate, ranking the paths (sequences of pages) visited prior to the desired event, by descending frequency of use. The idea is to determine what features of the website encourage the desired result. “Fallout analysis,” a subset of path analysis, looks at “black holes” on the site, or paths that lead to a dead end most frequently, paths or features that confuse or lose potential customers.
Some known implementations of path analysis systems utilize a distributed computing architecture. Such distributed computing architecture may provide the required amount of processing power. However, as the amount of logged data increases over time, proportional increases in the processing power may be required. This may incur additional costs for upgrading the distributed computing architecture.
Further, processing of the large amount of logged data may require that a significant amount of data be transferred back and forth between compute nodes within the distributed computing architecture. Again, as the amount of logged data increases, the data transfer links may need to be upgraded over time.