Cellular constituent abundance data from microarrays and, more generally, functional genomics, has become an important tool in life sciences as well as medical research. Cellular constituents are individual genes, proteins, mRNA expressing genes, and/or any other variable cellular component or protein activities such as the degree of protein modification (e.g., phosphorylation), for example, that is typically measured in biological experiments (e.g., by microarray) by those skilled in the art. Significant discoveries relating to the complex networks of biochemical processes underlying living systems, common human diseases, and gene discovery and structure determination can now be attributed to the application of cellular constituent abundance data as part of the research process. See, for example, Hughes et al., 2000, Cell 102, 109; Karp et al., 2000, Nat. Immunol. 1, 221; Schadt et al., 2003, Nature 422, 297; Eaves et al., 2002, Genome Res. 12, 232, and Shoemaker et al., 2001, Nature 409, 922, each of which is hereby incorporated by reference herein in its entirety. Cellular constituent abundance data have also helped to identify biomarkers, discriminate disease subtypes and identify mechanisms of toxicity. See, for example, DePrimo et al., 2003, BMC Cancer 3, 3; van de Vijver et al., 2002, N. Engl. J. Med. 347, 1999; van't Veer et al., 2002, Nature 415, 530; and Waring et al., 2002, Toxicology 181-182, 537, each of which is hereby incorporated by reference herein in its entirety.
The use of cellular constituent abundance data from sources such as microarrays as a tool to identify genes responsible for traits, including common human diseases, continues to prove to be difficult. Elucidating hundreds or even thousands of genes whose expression changes are associated with a disease state does not directly lead to the identification of the key drivers involved in the disease processes. Subsequent validation of candidate genes identified from gene expression experiments is presently a hit-or-miss and time consuming process. This validation typically involves gene knock outs/ins, transgenic construction, siRNA studies, drug treatments targeting candidate genes, time series experiments, and/or the development of specific assays intended to test hypotheses generated from gene expression experiments. These validation methods do not easily lend themselves to high-throughput processes and can often take as long as eighteen months to complete. Developing methods that allow for the objective, data driven identification of the key drivers of common human diseases would significantly enhance the utility of cellular constituent abundance measurement experiments in the target discovery process. More generally, such methods would also provide a framework for elucidating genetic networks.
Cellular constituent abundance data has recently been combined with other experimental data to allow for the more immediate identification of key drivers for complex disease traits. See, for example, Schadt et al., 2003, Nature 422, 297; Brem et al., 2002, Science 296, 752; and Klose et al., 2002, Nat. Genet. 30, 385, each of which is hereby incorporated by reference herein in its entirety. One such technique involves treating cellular constituent abundance data (e.g., gene expression data) as a quantitative trait in segregating populations. In such a method, chromosomal regions controlling the level of expression of a particular gene are mapped as abundance quantitative trait loci (eQTL). Abundance QTL that contain the gene encoding the mRNA (cis-acting eQTL) are distinguished from the other (trans-acting) eQTL, and those cis-acting eQTL that co-localize with chromosomal regions controlling a disease (clinical) trait (cQTL) are identified. The identification of a common chromosomal location for both cis-acting eQTL and a cQTL is used to nominate susceptibility loci for the disease trait. See, for example, Karp et al., 2000, Nat. Immunol 1, 221; Schadt et al. Nature 422, 297; and Eaves et al., 2002, Genome Res. 12, 232, each of which is hereby incorporated by reference herein in its entirety.
Overall, the development and widespread use of microarray technology in various scientific and medical disciplines has generated large amounts of abundance data for cellular constituents from numerous organisms. Also, the genome sequences of humans and several model organisms have established a nearly complete list of the genes required to enact cellular, developmental, and behavioral processes in these organisms (Goffeau et al., 1996, Science 274, 546; Myers et al., 2000, Science 287, 2196; Lander et al., 2001, Nature 409, 860; and Venter et al., 2001, Science 291, 1304, each of which is hereby incorporated herein by reference in its entirety). The next major challenges in genomic research is to elucidate the functions of the large fraction of cellular constituents whose functions are currently unknown and to discover how these cellular constituents interact to perform specific biological processes. DNA microarray experiments provide a first step towards the goal of uncovering the function of cellular constituents on a global scale. Cellular constituents are often co-regulated. For example, genes that encode proteins that participate in the same pathway or are part of the same protein complex often exhibit similar transcription or expression patterns. Clusters of cellular constituents with related functions also often exhibit expression patterns that are correlated under large numbers of diverse conditions in DNA microarray experiments (Eisen et al., 1998, Proc. Natl. Acad. Sci. U.S.A. 95, 14863; Hughes et al., 2000, Cell 102, 109; Kim et al., 2001, Science 293, 2087; and Segal et al., 2003, Nature Genet. 34, 166, each of which is hereby incorporated by reference herein in its entirety). Such co-regulations are often broadly regarded as interactions. Identifying the interactions between cellular constituents is critical to understanding the biology behind such data. See, for example, Stuart et al., 2003, Science 302, 249, which is hereby incorporated by reference herein in its entirety. These interactions, especially those that are highly conserved across experimental conditions and genetic backgrounds, provide critical information in revealing the most highly conserved and functionally important cellular constituents that may serve as drug targets.
Existing methods in the art for identifying the critical interactions between cellular constituents are often highly convoluted and inefficient. Such methods are also often impractical to apply to different systems. The results from the existing methods are often difficult to interpret. What is needed in the art are efficient and effective systems and methods for processing the available abundance data of cellular constituents.