Predicting phenotype from genotype, in terms of the various molecular processes that govern the behavior of the cell, is one of the central goals of biology. Gene expression levels are an intermediate molecular phenotype of great utility. In recent years, parallel high-throughput genotyping and mRNA expression profiling have enabled researchers to start asking quantitative questions regarding the genetics of gene expression in a variety of organisms, and have revealed that steady-state mRNA abundance is highly heritable for the majority of genes.
Genetic mapping has been applied to high-throughput expression data to identify expression quantitative trait loci (eQTLs) that influence the expression level of individual genes. While sequence polymorphisms in cis-regulatory regions account for part of the genetic variation in expression, some is due to trans-acting polymorphisms at distal loci. The influences of such loci on the transcription factor rate and/or mRNA half-life of individual genes can be quite indirect, but nucleotide-binding, trans-acting factors play a mediating role.
There remains a need in the art to account for the existence of gene regulatory networks that allow for more sensitive detection of cis- and trans-acting polymorphisms. Although approaches that identify subsets of co-expressed genes have been used, they are generally most useful when only a relatively small number of cell state parameters are perturbed, and the expression of large subsets of genes changes in a coherent way. Such methods can be less naturally suitable for analyzing natural genetic variation in gene expression, where the segregation of alleles in the cross causes a large number of cell state parameters to be independently perturbed.
Approaches based on the linear decomposition of the matrix of genes by segregants have also been explored, such as Principal Component Analysis (PCA), which exploits the correlation structure of the gene expression matrix, or Network Component Analysis, which extends PCA by incorporating qualitative information about regulatory network topology. While these methods increase statistical power compared to a single gene based approaches, there remains a need in the art to more fully account for the heritable variation in gene expression.