The field of the invention is DNA sequence classification, identification or determination, and quantification; more particularly it is the quantitative classification, comparison of expression, or identification of preferably all DNA sequences or genes in a sample without performing any associated sequencing.
As molecular biological and genetics research have advanced, it has become increasingly clear that the temporal and spatial expression of genes plays a vital role in processes occurring in both health and in disease. Moreover, the field of biology has progressed from an understanding of how single genetic defects cause the traditionally recognized hereditary disorders (e.g., the thalassemias), to a realization of the importance of the interaction of multiple genetic defects in concert with various environmental factors in the etiology of the majority of the more complex disorders, such as neoplasia.
For example, in the case of neoplasia, recent experimental evidence has demonstrated the key causative roles of multiple defects in several pivotal genes causing their altered expression. Other complex diseases have been shown to have a similar etiology. Therefore, the more complete and reliable a correlation which can be established between gene expression and disease states, the better dieases will be able to be recognized, diagnosed and treated. This important correlation may be established by the quantitative determination and classification of DNA expression in tissue samples.
Genomic DNA (xe2x80x9cgDNAxe2x80x9d) sequences are those naturally occurring DNA sequences constituting the genome of a cell. The overall state of gene expression within genomic DNA (xe2x80x9cgDNAxe2x80x9d) at any given time is represented by the composition of cellular messenger RNA (xe2x80x9cmRNAxe2x80x9d), which is synthesized by the regulated transcription of gDNA. Complementary DNA (xe2x80x9ccDNAxe2x80x9d) sequences may be synthesized by the process of reverse transcription of mRNA by use of viral reverse transcriptase. cDNA derived from cellular mRNA also represents, albeit approximately, gDNA expression within a cell at a given time. Accordingly, a methodology which would allow the rapid, economical and highly quantitative detection of all the DNA sequences within particular cDNA or gDNA samples is extremely desirable.
Heretofore, gene-specific DNA analysis methodologies have not been directed to the determination or classification of substantially all genes within a DNA sample representing the total transcribed cellular mRNA population and have universally required some degree of nucleic acid sequencing to be performed. As a result, existing cDNA and gDNA, analysis techniques have been directed to the determination and analysis of only one or two known or unknown genetic sequences at a single time. These techniques have typically utilized probes which are synthesized to specifically recognize (by the process of hybridization) only one particular DNA sequence or gene. See e.g., Watson, J. 1992. Recombinant DNA, chap 7, (W. H. Freeman, New York.). Furthermore, the adaptation of these methods to the recognition of all sequences within a sample would be, at best, highly cumbersome and uneconomical.
One existing method for detecting, isolating and sequencing unknown genes utilizes an arrayed cDNA library. From a particular tissue or specimen, mRNA is isolated and cloned into an appropriate vector, which is introduced into bacteria (e.g., E. coli) through the process of transformation. The transformed bacteria are then plated in a manner such that the progeny of individual vectors bearing the clone of a single cDNA sequence can be separately identified. A filter xe2x80x9creplicaxe2x80x9d of such a plate is then probed (often with a labeled DNA oligomer selected to hybridize with the cDNA representing the gene of interest) and those bacteria colonies bearing the cDNA of interest are identified and isolated. The cDNA is then extracted and the inserts contained therein is subjected to sequencing via protocols which includes, but are not limited to the dideoxynucleotide chain termination method. See Sanger, F., et al. 1977. DNA Sequencing with Chain Terminating Inhibitors. Proc. Natl. Acad. Sci. USA 74(12):5463-5467.
The oligonucleotide probes utilized in colony selection protocols for unknown gene(s) are synthesized to hybridize, preferably, only with the cDNA for the gene of interest. One method of achieving this specificity is to start with the protein product of the gene of interest. If a partial sequence (i.e., from a peptide fragment containing 5 to 10 amino acid residues) from an active region of the protein of interest can be determined, a corresponding 15 to 30 nucleotide (nt.) degenerate oligonucleotide can be synthesized which would code for this peptide fragment. Thus, a collection of degenerate oligonucleotides will typically be sufficient to uniquely identify the corresponding gene. Similarly, any information leading to 15-30 nt. subsequences can be used to create a single gene probe.
Another existing method, which searches for a known gene in cDNA or gDNA prepared from a tissue sample, also uses single-gene or single-sequence oligonucleotide probes which are complementary to unique subsequences of the already known gene sequences. For example, the expression of a particular oncogene in sample can be determined by probing tissue-derived cDNA with a probe which is derived from a subsequence of the oncogene""s expressed sequence tag. The presence of a rare or difficult to culture pathogen (e.g., the TB bacillus) can also be determined by probing gDNA with a hybridization probe specific to a gene possessed by the pathogen. Similarly, the heterozygous presence of a mutant allele in a phenotypically normal individual, or its homozygous presence in a fetus, may be determined by the utilization of an allele-specific probe which is complementary only to the mutant allele. See e.g., Guo, N.C., et al. 1994. Nucleic Acid Research 22:5456-5465).
Currently, all of the existing methodologies which utilize single-gene probes, if applied to determine all of the genes expressed within a given tissue sample, would require many thousands to tens-of-thousands of individual probes. It has been estimated that a single human cell typically expresses approximately 5,000 to 15,000 genes simultaneously, and that the most complex types of tissues (e.g., brain tissue) can express up to one-half of the total genes contained within the human genome. See Liang, et al. 1992. Differential Display of Eukaryotic Messenger RNA by Means of the Polymerase Chain Reaction. Science 257:967-971. It is obvious that an screening methodology which requires such a large number of probes is clearly far too cumbersome to be economic or, even practical.
In contrast, another class of existing methods, known as sequencing-by-hybridization (xe2x80x9cSBHxe2x80x9d), utilize combinatorial probes which are not gene specific. See e.g., Drmanac, et al. 1993. Science 260:1649-1652; U.S. Pat. No. 5,202,231 to Drmanac, et al. An exemplar implementation of SBH for the determination of an unknown gene requires that a single cDNA clone be probed with all DNA oligomers of a given length, say, for example, all 6 nt. oligomers. A set of oligomers of a given length which are synthesized without any type of selection is called a combinatorial probe library. A partial DNA sequence for the cDNA clone can be reconstructed by algorithmic manipulations from the hybridization results for a given combinatorial library (i.e., thre hybridization results for the 4096 oligomer probes having a length of 6 nt.). However, complete nucleotide sequences are not determinable, because the repeated subsequences cannot be fully ascertained in a quantitative manner.
SBH which is adapted to the identification of known genes is called oligomer sequence signatures (xe2x80x9cOSSxe2x80x9d). See e.g., Lennon, et al. 1991. Trends In Genetics 7(10):314-317. OSS classifies a single clone based upon the pattern of probe xe2x80x9chitsxe2x80x9d (i.e., hybridizations) against an entire combinatorial library, or a significant sub-library. This methodology requires that the tissue sample library be arrayed into clones, wherein each clone comprises only a single sequence from the library. This technique cannot be applied to mixtures of sequences.
These previous, exemplar methodologies are all directed to finding one sequence in an array of clonesxe2x80x94with each clone expressing a single sequence from a given tissue sample. Accordingly, they are not directed to rapid, economical, quantitative, and precise characterization of all the DNA sequences in a mixture of sequences, such as a particular total cellular cDNA or GDNA sample, and their adaptation to such a task would be prohibitive. Determination by sequencing the DNA of a clone, much less an entire sample of thousands of genomic sequences, is not rapid or inexpensive enough for economical and useful diagnostics. As previously discussed, existing probe-based techniques of gene determination or classification, whether the genes are known or unknown, require many thousands of probes, each specific to one possible gene to be observed, or at least thousands or even tens of thousands of probes in a combinatorial library. Further, all of these aforementioned methods require the sample be arrayed into clones each expressing a single gene of the sample.
In contrast to the prior exemplar gene determination and classification techniques, another methodology, known as differential display, attempts to xe2x80x9cfingerprintxe2x80x9d a mixture of expressed genes, as is found in a pooled cDNA library. This xe2x80x9cfingerprint,xe2x80x9d however, seeks merely to establish whether two samples are the same or different. No attempt is made to determine the quantitative, or even qualitative, expression of particular genes. See e.g., Liang, et al. 1995. Curr. Opin. Immunol. 7:274-280; Liang, et al. 1992. Science 257:967-971; Welsh, et al. 1992. Nuc. Acid Res. 20:4965-4970; McClelland, et al. 1993. Exs. 67:103-115 and Lisitsyn, 1993. Science 259:946-950. Differential display uses the polymerase chain reaction (xe2x80x9cPCRxe2x80x9d) to amplify DNA subsequences of various lengths, which are then defined by their being between the annealing sites of arbitrarily selected primers. Polymerase chain reaction method and apparatus are well known. See, e.g., U.S. Pat. Nos. 4,683,202; 4,683,195; 4,965,188; 5,333,675; each herein fully incorporated by reference. Ideally, the pattern of the lengths observed is characteristic of the specific tissue from which the library was originally prepared. Typically, one of the primers utilized in differential display is oligo(dT) and the other is one or more arbitrary oligonucleotides which are designed to hybridize within a few hundred base pairs (bp.) of the homopolymeric poly-dA tail of a cDNA within the library. Thereby, upon electrophoretic separation, the amplified fragments of lengths up to a few hundred base pairs should generate bands which are characteristic and distinctive of the sample. In addition, changes in gene expression within the tissue may be observed as changes in one or more of the cDNA bands.
In the differential expression methodology, although characteristic electrophoretic banding patterns develop, no attempt is made to quantitatively xe2x80x9clinkxe2x80x9d these patterns to the expression of particular genes. Similarly, the second arbitrary primer also cannot be traced to a particular gene due to the following reasons. First, the PCR process is less than ideally specific. One to several base pair mismatches are permitted by the lower stringency annealing step which is typically utilized in this methodology and are generally tolerated well enough so that a new chain can actually be initiated by the Tag polymerase often used in PCR reactions. Secondly, the location of a single subsequence (or its absence) is insufficient to distinguish all expressed genes. Third, the resultant bp.-length information (i.e., from the arbitrary primer to the poly-dA tail) is generally not found to be characteristic of a sequence due to: (i) variations in the processing of the 3xe2x80x2-untranslated regions of genes, (ii) variation in the poly-adenylation process and (iii) variability in priming to the repetitive sequence at a precise point. Therefore, even the bands which are produced often are smeared by numerous, non-specific background sequences.
Moreover, known PCR biases towards nucleic acid sequences containing high G+C content and short sequences, further limit the specificity of this methodology. In accord, this technique is generally limited to the xe2x80x9cfingerprintingxe2x80x9d of samples for a similarity or dissimilarity determination and is precluded from use in quantitative determination of the differential expression of identifiable genes.
Thus, in conclusion, the existing methodologies utilized for gene or DNA sequence classification and determination are in need of improvement with respect to their ability to perform a highly specific quantitative determination of the components of a cDNA mixture prepared from a tissue sample in a rapid, economical and reproducible manner.
The preferred embodiment of the present invention discloses a methodology which is directed to providing positive confirmation that nucleic acid fragments, possessing putatively identified sequences which have been predicted to generate observed GeneCalling(trademark) (see infra, p.9) signals, are actually present within the sample generating the signal. This methodology, hereinafter known as xe2x80x9coligo-poisoning,xe2x80x9d confirms the presence of a specific, defined flanking nucleic acid subsequence which is adjacent to the xe2x80x9ctargetxe2x80x9d subsequence of interest recognized by the probing means within a nucleic acid-containing sample. Oligo-poisoning proceeds by initially performing PCR amplification of, for example GeneCalling(trademark) reaction products, so as to produce results which indicate whether a nucleic acid fragment contained within the GeneCalling(trademark) reaction either possesses or lacks the putatively identified subsequence. In the preferred embodiment, this is achieved by adding a molar excess of a xe2x80x9cpoisoningxe2x80x9d primer designed to amplify only those nucleic acid fragments having the putatively identified subsequence. The xe2x80x9cpoisoningxe2x80x9d primer may, preferably, be unlabeled or it may be labeled so as to allow it to be differentiated from any other type of label utilized in the PCR amplification reaction. Following PCR amplification, the resulting reaction products are then separated by electrophoresis. As those nucleic acid fragments containing the putatively identified subsequence which have undergone amplification will be, preferably, unlabeled, they will not generate a detectable signal. Accordingly, all amplification products of such nucleic acid fragments will be unlabeled and undetectable.
Importantly, oligo-poisoning is also equally applicable to confirming putative sequence identifications in any sample, of nucleic acid fragments which possess a certain generic sequence structure or motif. This generic structure only limits fragments to have known terminal subsequences capable of acting as PCR primers. Several methods are known in the art for producing samples with such a generic structure.
The present invention provides a methodology for confirming a putatively identified sequence of a nucleic acid fragment in a sample of nucleic acids wherein each nucleic acid fragment within said sample possesses known, 3xe2x80x2- and 5xe2x80x2-terminal subsequences, said methodology comprising; contacting said nucleic acid fragments in said sample in amplifying conditions with (i) a nucleic acid polymerase; (ii) xe2x80x9cregularxe2x80x9d primer ongonucleotides having sequences comprising hybridizable portions of said known terminal subsequences; and (iii) a xe2x80x9cpoisoningxe2x80x9d oligonucleotide primer, said xe2x80x9cpoisoningxe2x80x9d primer having a sequence comprising a first subsequence that is a portion of the sequence of one of said known terminal subsequences and a second subsequence that is a hybridizable portion of said putatively unidentified sequence which is adjacent to said one known terminal subsequence, wherein nucleic acids amplified with said xe2x80x9cpoisoningxe2x80x9d primer are distinguishable upon detection from nucleic acids amplified with said nucleic acids amplified only with said regular primers; separating the products of the contacting step; and the detecting sequence is confirmed if the nucleic acids amplified with said xe2x80x9cpoisoningxe2x80x9d primer are detected.
The present invention further provides that: (i) the regular PCR primers are labeled and, preferably, said xe2x80x9cpoisoningxe2x80x9d primer is unlabeled; (ii) the regular PCR primers are labeled and the xe2x80x9cpoisoningxe2x80x9d primer is labeled in a detectably different manner so as to allow its differentiation from any other label utilized in the amplification reaction; or (iii) the regular PCR primers are unlabeled and the poisoning primer is labeled and, optionally, wherein the step of detecting said separated products further comprises confirming said putatively identified sequence if said nucleic acid fragment with a putatively identified sequence is not detected. In the preferred embodiment of the present invention the regular PCR primers are labeled and, preferably, said xe2x80x9cpoisoningxe2x80x9d primer is unlabeled.
It is an object of this invention to provide a methodology for the rapid, economical, quantitative, and highly specific determination or classification of DNA sequences, in particular genomic DNA (gDNA) or complementary DNA (cDNA) sequences, in either arrays of single sequence clones or mixtures of sequences such as can be derived from tissue samples, without actually sequencing the DNA. Thereby, the aforementioned deficiencies within the background arts are greatly mitigated. This objective is realized by generating a plurality of distinctive and detectable signals from the DNA sequences in the sample being analyzed. Preferably, all the resultant signals taken together have sufficient discrimination and resolution so that each particular DNA sequence contained within a sample may be individually classified by the particular signals it generates, and with reference to a database of all DNA sequences possible in the sample, individually determined. The intensity of the signals indicative of a particular DNA sequence depends, preferably, on the amount of that DNA present. Alternatively, the signals together can classify a predominant fraction of the DNA sequences into a plurality of sets of approximately no more than two to four individual sequences.
It is a further object that the numerous signals be generated from measurements of the results of as few a number of recognition reactions as possible, preferably no more than approximately 5-400 reactions, and most preferably no more than approximately 20-200 reactions. It should be noted that rapid and economical determinations would not be achieved if each DNA sequence in a sample containing a complex mixture required a separate reaction with a unique probe. Preferably, each recognition reaction generates a large number of or a distinctive pattern of distinguishable signals, which are quantitatively proportional to the amount of the particular DNA sequences present. Further, the signals are preferably detected and measured with a minimum number of observations, which are preferably capable of being simultaneously performed.
The signals are preferably optical in nature (e.g., generated by fluorochrome labels) and are, preferably, detected by automated optical detection technologies. Using these methods, multiple individually labeled moieties can be discriminated even though they are spatially located within the same xe2x80x9cspotxe2x80x9d on a hybridization membrane or electrophoretic gel band. Therefore, this level of discrimination permits multiplexing reactions and parallelizing signal detection. Alternatively, the invention is easily adaptable to other labeling systems (e.g., silver staining of gels). In particular, any single molecule detection system, whether optical or by some other technology (such as scanning or tunneling microscopy), would be highly advantageous for utilization according to this invention, as it would greatly improve the quantitative characteristics.
Signals (also referred to herein as xe2x80x9chitsxe2x80x9d) are generated by detecting the presence or absence of short DNA subsequences (hereinafter called xe2x80x9ctargetxe2x80x9d subsequences) within a nucleic acid sequence of the sample to be subsequently analyzed. The presence or absence of a given subsequence is detected by use of recognition means (i.e., probes) for the subsequence. The subsequence(s) are recognized by various recognition means, including but not limited to restriction endonucleases (xe2x80x9cREsxe2x80x9d), DNA oligomers, and PNA oligomers. REs recognize their specific subsequences by cleavage thereof; DNA and PNA oligomers recognize their specific subsequences by hybridization methods. The preferred embodiment detects not only the presence of pairs of hits in a sample sequence but also include a representation of the length in base pairs between adjacent hits. This length representation may be corrected to true physical length in base pairs upon the removal of experimental biases and errors inherent in the length separation and detection means. An alternative embodiment detects only the pattern of hits in an array of clones, each containing a single sequence (xe2x80x9csingle sequence clonesxe2x80x9d). This may be accomplished by knowing the sequence of each clone and/or by determining the length (either measured or physical) of the recognized sequences.
The generated signals are then analyzed together with DNA sequence information stored within sequence databases utilizing computer implemented experimental analysis methods to: (i) identify individual genes and (ii) establish their quantitative presence within the sample. The target subsequences are chosen by further computer implemented experimental design methods of the present invention such that their presence or absence, as well as their relative distances when present, yield a maximum amount of information for classifying or determining the DNA sequences to be analyzed.
By use of this methodology, it is possible to have orders of magnitude fewer probes than there are DNA sequences to be analyzed, and it is further possible to have considerably fewer probes than would be present in combinatorial libraries of the same length as the probes used in this invention. The target subsequences have a preferred probability of occurrence in a sequence (typically between 5% and 50%). In the preferred embodiment, it is preferred that the presence of one probe in a DNA sequence to be analyzed is independent of the presence of any other probe. Preferably, target subsequences are chosen based on information in relevant DNA sequence databases that characterize the sample. A minimum number of target subsequences may be chosen to determine the expression of all genes in a tissue sample (hereinafter xe2x80x9ctissue modexe2x80x9d). Alternatively, a smaller number of target subsequences may be chosen to quantitatively classify or determine only one or a few sequences of genes of interest, for example oncogenes, tumor suppressor genes, growth factors, cell, cycle genes, cytoskeletal genes, and the like (hereinafter xe2x80x9cquery modexe2x80x9d).
The preferred embodiment of this detection methodology, quantitative expression analysis (hereinafter referred to as xe2x80x9cGeneCalling(trademark)xe2x80x9d) generates signals which comprise both the target subsequence presence and a representation of the length in base pairs between adjacent target subsequences via the measurement of the results of recognition reactions on cDNA (or gDNA) mixtures. A detailed disclosure of the GeneCalling(trademark) methodology may be found in PCT/US96/17159, published as WO97/15690, herein incorporated by reference, which is entitled xe2x80x9cMETHOD AND APPARATUS FOR IDENTIFYING, QUANTIFYING, AND CONFIRMING DNA SEQUENCES IN A SAMPLE WITHOUT SEQUENCING.xe2x80x9d Most importantly, this methodology does not require the insertion of the cDNA into a vector so as to create individual clones in a library. It is well known within the relevant fields that the creation of these cDNA libraries is time consuming, costly, and introduces bias into the process, as it requires the cDNA in the vector to be transformed into bacteria, the bacteria arrayed as clonal colonies, and finally the growth of the individual transformed colonies.
As is disclosed in W097/15690, three exemplar experimental methodologies may be utilized for GeneCalling(trademark): (i) a preferred Polymerase Chain Reaction (PCR) based method; (ii) an RE/ligase/amplification procedure and (iii) a method utilizing a removal means, preferably biotin, for removal of unwanted DNA fragments. However, only the preferred PCR-based experimental methodology will be disclosed herein as it serves to generate precise, reproducible, noise free signatures for determining individual gene expression from DNA in mixtures or libraries and is uniquely adaptable to automation, as it does not require intermediate extractions or buffer exchanges. A computer implemented gene calling step uses the hit and length information measured in conjunction with a database of DNA sequences to determine which genes are present in the sample and the relative levels of expression. Signal intensities are used to determine relative amounts of sequences in the sample; whereas computer-implemented design methods optimize the choice of the target subsequences.
As previously discussed, the PCR-based GeneCalling(trademark) methodology disclosed herein, preferably generates measurements that are precise, reproducible, and free of noise. Measurement noise in GeneCalling(trademark) is typically created by generation or amplification of unwanted DNA fragments, and special steps are preferably taken to avoid any such unwanted fragments. This embodiment of the invention facilitates efficient analysis by permitting multiple recognition means to be tested in one reaction and by utilizing multiple, distinguishable labeling of the recognition means, so that signals may be simultaneously detected and measured. Preferably, for GeneCalling,(trademark) labeling is accomplished by use of multiple fluorochrome moieties. An increase in sensitivity as well as an increase in the number of resolvable fluorescent labels can be achieved by the use of fluorescent, energy transfer, dye-labeled primers. Other detection methods, preferable when the genes being identified will be physically isolated from the gel for later sequencing or use as experimental probes, include the use of silver staining gels or of radioactive labeling. Since these methods do not allow for multiple samples to be run in a single lane, they are less preferable when high throughput is needed.
Due to the fact that the confirmation of GeneCalling(trademark) by the oligo-poisoning methodology achieves rapid and economical quantitative determination and confirmation of differential gene expression in tissue or other samples, it has considerable medical and research utility. For example, in clinical medicine, as more and more diseases are recognized to have important genetic components to their etiology and development, it is becoming increasingly useful to be able to assay the genetic makeup and expression of a tissue sample. More specifically, the presence and expression of certain genes or their particular alleles are prognostic or risk factors for disease (including disorders). Several examples of such diseases are found among the neurodegenerative diseases, such as Huntington""s disease and ataxia-telangiectasia. Several cancers (e.g., neuroblastoma) can now be quantitatively linked to specific genetic defects. Finally, gene expression can also determine the presence and classification of those foreign pathogens which are difficult or impossible to culture in vitro but which nevertheless express their own unique genes.
Similarly, disease progression is reflected in changes in genetic expression of an affected tissue. For example, expression of particular tumor promoter genes and lack of expression of particular tumor suppressor genes is now known to correlate with the progression of certain tumors from normal tissue, to hyperplasia, to cancer in situ, and finally, to metastatic cancer. The return of a cell population to a xe2x80x9cnormalxe2x80x9d pattern of gene expression (e.g., through the use of anti-sense oligonucleotide technology), can correlate with tumor regression. The quantification of gene expression in a cancerous tissue can assist in staging and classifying this disease, as well as providing a basis to chose and guide therapy. Accurate disease classification and taging or grading using gene expression information can assist in choosing initial therapies that are increasingly mome precisely tailored to the precise disease process occurring in the particular patient. Gene expression information can then track disease progression or regression, and such information can assist in monitoring the success or changing the course of an initial therapy. A favored therapy is one which results in a regression of an abnormal pattern of gene expression in an individual towards xe2x80x9cnormality,xe2x80x9d while a therapy which has little effect on gene expression (i.e., its abnormal progression) may be modified or discontinued. Such monitoring of gene expression is now useful for cancers and will become useful for an increasing number of other diseases, such as diabetes and obesity.
In order to facilitate the utilization of the present invention for the quantitative detection, confirmation and monitoring of such differential gene expression in patients with the aforementioned diseases, it is envisioned that the GeneCalling(trademark)/oligo-poisoning methodologies will be incorporated into a unitized xe2x80x9ckitxe2x80x9d form. This will enable the researcher or health care provider to rapidly and accurately assess such differential gene expression in the most efficacious manner possible. For example, the kit may utilize non-radioactive labeling of the PCR amplification probes and xe2x80x9cpre-castxe2x80x9d electrophoresis gels to ameliorate some of the difficulties indigenous to these methodologies, thus markedly increasing the potential for the acceptance and wide-spread use of such a kit in less sophisticated settings (i.e., a physician""s office).
Furthermore, in the case of direct gene therapy, expression analysis directly monitors the success of treatment. In biological research, rapid and economical assay for gene expression in tissue or other samples has numerous applications. Such applications include, but are not limited to, for example, in pathology examining tissue specific genetic response to disease, in embryology determining developmental changes in gene expression, in pharmacology assessing direct and indirect effects of drugs on gene expression. In these applications, this invention can be applied, for example, to in vitro cell populations or cell lines, to in vivo animal models of disease or other processes, to human samples, to purified cell populations perhaps drawn from actual wildype occurrences, and to tissue samples containing mixed cell populations. The cell or tissue sources can advantageously be plant, single celled animal, multicellular animal, bacterial, viral, fungal, yeast, or the like. The animal can advantageously be laboratory animals used in research, such as mice engineered or bred to have certain genomes or disease conditions or tendencies. The in vitro cell populations or cell lines can be exposed to various exogenous factors to determine the effect of such factors on gene expression. Further, since an unknown signal pattern is indicative of an as yet unknown gene, this invention has important use for the discovery of new genes. In medical research, by way of further example, use of the methods of this invention allow correlating gene expression with the presence and progress of a disease and thereby provide new methods of diagnosis and new avenues of therapy which seek to directly alter gene expression.
Finally, gene expression analysis can also be utilized for pharmogenomic analysis of drug action and efficacy. For example, the present invention may be used to quantitatively ascertain the mechanism of a specific drug""s biological activity and why the drug(s) may fail to work as predicted. This application has utility, for example, in stratifying patient populations.