This invention is related to bioinformatics and biological data analysis. Specifically, this invention provides methods, computer software products and systems for the analysis of biological data.
Many biological functions are carried out by regulating the expression levels of various genes, either through changes in the copy number of the genetic DNA, through changes in levels of transcription (e.g. through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes, or through changes in protein synthesis. For example, control of the cell cycle and cell differentiation, as well as diseases, are characterized by the variations in the transcription levels of a group of genes.
Recently, massive parallel gene expression monitoring methods have been developed to monitor the expression of a large number of genes using nucleic acid array technology which was described in detail in, for example, U.S. Pat. No. 5,871,928; de Saizieu, et al., 1998, Bacteria Transcript Imaging by Hybridization of total RNA to Oligonucleotide Arrays, NATURE BIOTECHNOLOGY, 16:45-48; Wodicka et al., 1997, Genome-wide Expression Monitoring in Saccharomyces cerevisiae, NATURE BIOTECHNOLOGY 15:1359-1367; Lockhart et al., 1996, Expression Monitoring by Hybridization to High Density Oligonucleotide Arrays. NATURE BIOTECHNOLOGY 14:1675-1680; Lander, 1999, Array of Hope, NATURE-GENETICS, 21(suppl.), at 3.
Massive parallel gene expression monitoring experiments generate unprecedented amounts of information. For example, a commercially available GeneChip(copyright) array set is capable of monitoring the expression levels of approximately 6,500 murine genes and expressed sequence tags (ESTs) (Affymetrix, Inc, Santa Clara, Calif., USA). Array sets for approximately 60,000 human genes and EST clusters, 24,000 rat transcripts and EST clusters and arrays for other organisms are also available from Affymetrix. Effective analysis of the large amount of data may lead to the development of new drugs and new diagnostic tools. Therefore, there is a great demand in the art for methods for organizing, accessing and analyzing the vast amount of information collected using massive parallel gene expression monitoring methods.
The current invention provides methods, systems and computer software products suitable for analyzing data from gene expression monitoring experiments that employ multiple probes against a single target.
Computer implemented methods for determining hybridization between a plurality of nucleic acid probes and a nucleic acid target are provided. The methods are useful for analyzing any hybridization between multiple probes and a target nucleic acid. It is particularly useful for analyzing gene expression experiments where a single transcript is determined using multiple probes.
In some embodiments, the method include steps of inputting a plurality of hybridization intensities, each of the intensities reflects the hybridization between one of the plurality of the probes and the nucleic acid target; adjusting the hybridization intensities for hybridization affinities of the probes to obtain a plurality of adjusted hybridization intensities; finding the minimal adjusted hybridization intensity among the adjusted hybridization intensities; and indicating the minimal adjusted hybridization intensity as a measurement of the hybridization. The hybridization affinities of the probes may be predicted based upon the sequence of the probes. The hybridization affinities may be inputted from a database where experimentally determined hybridization affinities are stored. The adjusted hybridization intensity are calculated according to:             Adjusted hybridization intensity        =          I      Γ        ;
where I is hybridization intensity and xcex93 is hybridization affinity.
In another aspect of the invention, computer software products are provided for determining hybridization between nucleic acid probes and a nucleic acid target. A software product may include a computer-readable medium having computer-executable instructions for performing the method of the invention.
In some embodiments, the software products may include computer program code for inputting a plurality of hybridization intensities, each of the hybridization intensities reflects the hybridization between one of the plurality of the probes and the nucleic acid target; computer program code for adjusting the hybridization intensities for hybridization affinities of the probes to obtain a plurality of adjusted hybridization intensities; computer program code for finding the minimal adjusted hybridization intensity among the adjusted hybridization intensities; and computer program code for indicating the minimal adjusted hybridization intensity as a measurement of said hybridization; and a computer readable media for storing the code.
The hybridization affinities of the probes may be predicted based upon the sequence of said probes and the software products contain code for performing the prediction. Alternatively, the predicted hybridization affinities may be inputted. In some embodiments, the hybridization affinities are inputted from a database. Hybridization affinities may also be measured experimentally. In preferred embodiments, the adjusted hybridization intensity may be calculated according to:             Adjusted hybridization intensity        =          I      Γ        ,
where I is hybridization intensity and said xcex93 is the hybridization affinity.
In yet another aspect of the invention, systems for analyzing nucleic acid hybridization are provided. In some embodiments, the system may include a processor; and a memory being coupled to the processor, the memory storing a plurality machine instructions that cause the processor to perform the method of the invention.