This invention relates generally to genome-wide analysis and, more specifically, to a method of determining the function of a gene.
The Human Genome Project, by cataloging the sequences of the estimated 100,000 human genes, provides a first step in understanding humans at the molecular level. However, with the completion of the sequencing phase of the project, many questions remain unanswered, including what roles most of these genes play in cells and how the genes work together to perform functions in cells. The answers to these questions will lead to important advances and developments in both research and medicine.
Exemplified by genome sequencing projects, discovery science enumerates all the genes or encoded products of a genome without concern for their functional characteristics and cellular roles. The Human Genome Project and other large scale sequencing projects have fueled technological advances in discovery science. Large-scale gene sequencing, gene expression analysis methods, such as DNA microarrays, and proteomics methods have facilitated the accumulation of an enormous amount of data describing the sequences and expression levels of virtually every gene in organisms such as, the bacterium Escherichia coli, the yeast Saccharomyces cerevisiae, the worm Caenorhabditis elegans, as well as more complex organisms such as humans. Volumes of sequence and expression data can be obtained from virtually any cell or organism. However, standing alone, these volumes of sequence and expression data are difficult to interpret and apply to accurately predicting cellular functions of genes and their products, their interplay within a cell, or their dynamics in response to change.
Over the past several years, researchers have attempted to understand and characterize functions of the many newly identified genes having unknown cellular roles by testing experimental hypotheses. Such hypothesis-driven research to determining the function of an uncharacterized gene, or its encoded product, typically involves formulating a working hypothesis based on empirical observations provided by sequence comparisons and experimental data. The working hypothesis is then tested experimentally to determine if a proposed function is correct. The process is revised and repeated until experimental results are consistent with the working hypothesis of the proposed cellular function. Such an approach is labor-intensive, time-consuming and constrained by available functional information.
One reason for the difficulties in determining functions of uncharacterized genes and their products using a hypothesis driven research approach is that the observations which form the foundation of the working hypothesis and the investigated genes are viewed in an isolated or static manner. These views can result from either a lack of available information or from practical consideration which preclude analysis of the dynamic interplay of the other numerous genes and molecules in the cell. Absent such knowledge or assessment of the various relationships, the reference point or context in which to interpret experimental results can be misconstrued, viewed too narrowly or, perhaps too broadly.
Thus, there exists a need for methods which assimilate biological information to predict gene function. The present invention satisfies this need and provides related advantages as well.