The recent development of complementary DNA micro-array technology provides a powerful analytical tool for human genetic research (M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science, 270(5235), 467-70, 1995). One of its basic applications is to quantitatively analyze fluorescence signals that represent the relative abundance of mRNA from two distinct tissue samples. cDNA micro-arrays are prepared by automatically printing thousands of cDNAs in an array format on glass microscope slides, which provide gene-specific hybridization targets. Two different samples (of mRNA) can be labeled with different fluors and then co-hybridized on to each arrayed gene. Ratios of gene-expression levels between the samples are calculated and used to detect meaningfully different expression levels between the samples for a given gene.
Biological Background and cDNA Micro-Array Technology
A cell relies on its protein components for a wide variety of its functions. The production of energy, the biosynthesis of all component macromolecules, the maintenance of cellular architecture and the ability to act upon intra and extracellular stimuli are all protein dependent. Each cell within an organism contains the information necessary to produce the entire repertoire of proteins which that organism can specify. This information is stored as genes within the organism's DNA genome. The number of human genes is estimated to be 30,000 to 100,000. Within any individual cell, only a portion of the possible gene set is present as protein. Some of the proteins present in a cell are likely to be present in all cells. These proteins serve functions required in every type of cell, and can be thought of as "housekeeping" proteins. Other proteins serve specialized functions only required in particular cell types. For example, muscle cells contain specialized proteins that form the dense contractile fibers of a muscle. Given that a large part of a cell's specific functionality is determined by the genes it is expressing, it is logical that transcription, the first step in the process of converting the genetic information stored in an organism's genome into protein, would be highly regulated by the control network that coordinates and directs cellular activity.
Regulation is readily observed in studies that scrutinize activities evident in cells configuring themselves for a particular function (specialization into a muscle cell) or state (active multiplication or quiescence). As cells alter their status, coordinate transcription of the protein sets required for this state can be observed. As a window both on cell status and on the system controlling the cell, detailed, global knowledge of the transcriptional state could provide a broad spectrum of information useful to biologists. Knowledge of when and in what types of cell the protein product of a gene of unknown function is expressed would provide useful clues as to the likely function of that gene. Determination of gene-expression patterns in normal cells could provide detailed knowledge of the way in which the control system achieves the highly coordinated activation and deactivation required for development and differentiation of a mature organism from a single fertilized egg. Comparison of gene expression patterns in normal and pathological cells could provide useful diagnostic "fingerprints" and help identify aberrant functions which would be reasonable targets for therapeutic intervention.
The ability to carry out studies in which the transcriptional state of a large number of genes is determined has, until recently, been severely inhibited by limitations on our ability to survey cells for the presence and abundance of a large number of gene transcripts in a single experiment. A primary limitation has been the small number of identified genes. In the case of humans, only a few thousand of the complete set (30,000 to 100,000 genes) have been physically purified and characterized to any extent. Another significant limitation has been the cumbersome nature of transcription analysis. Even a large experiment on human cells would track expression of only a dozen genes, clearly an inadequate sampling for inference about so complex a control system.
Two recent technological advances have provided the means to overcome some of these limitations to examining the patterns and relationships in gene transcription. The cloning of molecules derived from mRNA transcripts in particular tissues, followed by application of high throughput sequencing to the DNA ends of the members of these libraries has yielded a catalog of expressed sequence tags (ESTs) (M. S. Boguski and G. D. Schuler, "ESTablishing a human transcript map," Nature Genetics, 10(4), 369-71, 1995). These signature sequences provide unambiguous identifiers for a large cohort of genes. At present, approximately 40,000 human genes have been "tagged" by this route, and many have been mapped to their genomic location (G. D. Schuler and M. S. Boguski, et al., "A gene map of the human genome," Science, 274(5287), 540-6, 1996).
Additionally, the clones from which these sequences were derived provide analytical reagents which can be used in the quantitation of transcripts from biological samples. The nucleic acid polymers, DNA and RNA, are biologically synthesized in a copying reaction in which one polymer serves as a template for the synthesis of an opposing strand which is termed its complement. Even after separation from each other, these strands can be induced to pair quite specifically with each other to form a very tight molecular complex, a process called hybridization. This specific binding is the basis of most analytical procedures for quantitating the presence of a particular species of nucleic acid, such as the mRNA specifying a particular protein gene product. Micro-array technology, a recent hybridization-based process that allows simultaneous quantitation of many nucleic acid species, has been described (M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science, 270(5235), 467-70, 1995; J. DeRisi, L. Penland, P. O. Brown, M. L. Bittner, P. S. Meltzer, M. Ray, Y. Chen, Y. A. Su, and J. M. Trent, "Use of a cDNA microarray to analyse gene expression patterns in human cancer," Nature Genetics, 14(4), 457-60 ("DeRisi"), 1996; M. Schena, D. Shalon, R. Heller, A. Chai, P. O. Brown, and R. W. Davis, "Parallel human genome analysis: microarray-based expression monitoring of 1000 genes," Proc. Natl. Acad. Sci. USA., 93(20), 10614-9, 1996). This technique combines robotic spotting of small amounts of individual, pure nucleic acid species on a glass surface, hybridization to this array with multiple fluorescently labeled nucleic acids, and detection and quantitation of the resulting fluor tagged hybrids with a scanning confocal microscope. When used to detect transcripts, a particular RNA transcript (an mRNA) is copied into DNA (a cDNA) and this copied form of the transcript is immobilized on a glass surface. The entire complement of transcript mRNAs present in a particular cell type is extracted from cells and then a fluor-tagged cDNA representation of the extracted mRNAs is made in vitro by an enzymatic reaction termed reverse-transcription. Fluor-tagged representations of mRNA from several cell types, each tagged with a fluor emitting a different color light, are hybridized to the array of cDNAs and then fluorescence at the site of each immobilized cDNA is quantitated.
The various characteristics of this analytic scheme make it particularly useful for directly comparing the abundance of mRNAs present in two cell types. Visual inspection of such a comparison is sufficient to find genes where there is a very large differential rate of expression. A more thorough study of the changes in expression requires the ability to discern more subtle changes in expression level and the ability to determine whether observed differences are the result of random variation or whether they are likely to be meaningful changes.