The discovery of patterns in a dataset facilitates a number of technical applications such as the discovery of changes in discrete attribute values in first entities between different classes (e.g., diseased state, non-diseased state, disease stage, etc.). For instance, in the biological arts, advances in RNA-extraction protocols and associated methodologies has led to the ability to perform whole transcriptome shotgun sequencing that quantifies gene expression in biological samples in counts of transcript reads mapped to genes. This has given rise to high throughput transcript identification and the quantification of gene expression for hundreds or even thousands of individual cells in a single dataset. Thus, in the art, datasets containing discrete attribute values (e.g., count of transcript reads mapped to individual genes in a particular cell) for each first entity in a plurality of first entities for each respective second entity in a plurality of second entities have been generated. While this is a significant advancement in the art, a number of technical problems need to be addressed to make such data more useful.
One drawback with such advances in the art is that the datasets tend to be large and thus are not easily loaded in their entirety into non-persistent memory (e.g., random access memory) of conventional computers used by workers in the field when visualizing the data. And, even if such datasets were loaded into non-persistent memory, the processing time needed to discern patterns in such datasets is unsatisfactory. Another drawback is that experiments are not performed in a high replicate manner, thereby impairing the ability to use simplistic statistical methods to account for experimental design and to therefore appropriately account for stochastic variation in the data (e.g., stochastic variation in the counts of transcript reads mapped to genes arising from the experimental design). Moreover, yet another drawback with such advances in the art are the unsatisfactory way in which conventional methods find patterns in such datasets. For instance, such patterns may relate to the discovery of unknown classes among the members of the dataset. For example, the discovery that a dataset of what was thought to be homogenous cells turns out to include cells of two different classes. Such patterns may also relate to the discovery of variables that are statistically associated with known classes. For instance, the discovery that the transcript abundance of a subset of mRNA mapping to a core set of genes discriminates between cells that are in a diseased state versus cells that are not in a diseased state. The discovery of such patterns (e.g., the discovery of genes whose mRNA expression discriminates classes or that define classes) in datasets that are very large, are not amendable to classical statistics because of limited replicate information, and for which such patterns in many instances relate to biological processes that are not well understood remains a technical challenge for which improved tools are needed in the art in order to adequately address such drawbacks.