Most information structures impose a cost for data entry and impose semantic constraints and assumptions on the data they hold. A hierarchy is created where the information is referenced and often cross-referenced in more than one instance. In one example, a hierarchy of information is created for “toys” and “robots.” In this example, “toys” can be either categorized under “robots” or vice-versa. For such an example, a redundant reference can be created where the same information is stored twice, once for “toys” under “robots” and also once for “robots” under “toys.”
The present disclosure has considered some intermediate representation (IR) techniques that can be utilized to construct a data structure from the input data. Example IR type structures can be found in: a prefix tree or TRIE, a classical string compression techniques such as Lempel-Ziv-Welch (LZW), a genomic approaches such as the so-called Basic Local Alignment Search Tool (BLAST), and dynamic time warping and longest common subsequence like approaches. However, the present disclosure recognizes and appreciates that conventional IR techniques can result in undesirable cross-references that can result in the same information being stored multiple times.
The present disclosure contemplates that a graph is far more powerful than hierarchies, trees, or lists, and can provide much better efficiency and flexibility. A graph imposes no burden on data entry, and offers “pivoting” or “tree shaking”. A graph also allows the data to be characterized by statistics and probabilities—for example, the “similarity” measure of one “clique” to another, the “likelihood” of one “path” leading to another, etc. Hierarchies, trees, and lists can naturally and easily be “embedded” within a graph. A graph can serve as an ideal structure to store a set of arbitrary symbol streams (e.g. voice patterns) in a highly compressed way for later searching and retrieving. The present disclosure explores mechanisms and methods for indexing and organizing streams within a graph-like data structure.