Methods for assaying gene expression can be classified into two major types: open methods, which do not require prior knowledge of the genes being measured, and closed methods, which measure expression levels of already collected clones or sequences. Some expression analysis techniques can only measure on a gene-by-gene basis while others can assay multiple genes simultaneously. Finally, some methods can directly measure differential expression between two samples and some examine expression levels from one sample at a time, followed by computation based comparisons. Understanding differences between these methods is essential for choosing the best technology for a given application. Regardless of the methods chosen researchers must identify or access through databases vast quantities of expression information to find the actual cause and effect on the gene expression.
The history of gene expression analysis began when laboratory methods were developed to examine expression of individually known genes. The northern blot technique, introduced in 1977, hybridizes labeled DNA or RNA of known genes to RNA blots. The resulting expression patterns of mRNA transcripts were then read. This technique is still widely used to confirm the results of other types of gene expression studies. In 1977, another method was published that protects a DNA-labeled probe against degradation by the single-stranded nuclease SI if the probe is annealed to an RNA. Ten years later, RNase protection assays were developed to detect the expression of specific, previously characterized RNAs and to compare their levels of expression. With this technique, a specific labeled cDNA forms a hybrid with its corresponding mRNA. When exposed to a single-strand-specific nuclease, the hybrids resist degradation and can be detected using gel electrophoresis. A later approach, differential plaque-filter hybridization, can detect differences in the expression of cloned cDNA between two samples.
In 1993, subtractive hybridization techniques became available for constructing subtractive cDNA libraies. This methodology hybridizes cDNA from one pool to mRNA from another. Then, cDNA libraries are constructed from the transcripts that are not hybridized, these being used to identify specific mRNAs. A modification of this technique, representational difference analysis (RDA), also uses preferential amplification of non-subtracted fragments. In RDA, “representations” or simplified versions of the genomes being studied (amplicons) are created using restriction digestion This method was first developed to examine the differences between genomes, but has proven useful for cloning differentially expressed genes. From this method, suppressive subtractive hybridization (SSH) was derived, which enables further suppression amplification of non-subtracted fragments. SSH combines normalization (equalizing the abundance of cDNAs within the target population) and subtraction (excluding the common sequences between the target and driver populations) in a single procedure. Results from both RDA and SSH should be validated using other methods.
Early gene expression methods, such as those already mentioned, are relatively small-scale techniques. They either focus on measuring mRNA expression levels for individual well-characterized genes, or use in vitro nuclear “run-on” transcription assays to determine the transcriptional profiles of several active genes simultaneously. They are therefore inadequate for conducting large-scale screening and developing expression profile patterns for tissues or cells (the basic requirements for efficient pharmaceutical research). Thus, several newer methods for high-throughput screening (HTS) have been developed over the past decade, including differential display, expressed sequence tag (EST) methodology and many array techniques. Collectively, they have made it possible to identify the expression levels of novel genes and characterize them., correlate mRNA expression patterns in many tissue types with disease states, identify side effects of current and experimental treatments, and determine the effects of compounds on non-target tissues.
Differential display of eukaryotic mRNA, first reported in 1992, was a major advance in the comparison of gene expression differences between cells or tissues. Encompassing the use of either arbitrarily or specifically primed PCR, it is perhaps the most widely used method involving gel electrophoresis for comparing gene expression. Both methods amplify partial cDNAs from subsets of mRNA samples by using reverse transcription and PCR. These short cDNA fragments are then typically displayed on polyacryl-amide gels. Differential display can simultaneously measure both up- and down-regulation across tens of samples.
Originally, this method used an oligo(dT) primer with an anchor of one or two bases at the 39 terminal. Reverse transcription and denaturation were followed by arbitrary priming on the resulting first strand of cDNA. A series of products were then derived from the 39 end of the mRNAs by using PCR with the original primer (a radiolabeled nucleotide) and a set of short, random decamer primers. Each random primer annealed to the mRna at a different position relative to the anchor primer. Products showing significant differential expression were sequenced after size fractionation of the PCR sample using denaturing gel electrophoresis, generally after overnight autoradiographic exposure.
EST methodology can determine the expression profile of an entire cell or tissue under analysis. During the 1990s, EST methodology played the largest role in increasing the catalog of known genes. Using this approach cDNA clones are randomly picked and a single pass of sequencing is performed from one or both ends of each clone. Subsequent comparison with existing sequence databases immediately identifies novel sequences. Measuring how often a given sequence appears in a (representational) library enables the estimation of expression levels for each gene.
Although this method can accurately identify the presence of a proportion of genes relatively low sampling (typically 5,00010,000 sequences are generated from a tissue containing >20,000 distinct transcript types) makes it difficult to measure abundance of expression or to identify differentially expressed genes except where genes are highly up- or down-regulated.
Serial analysis of gene expression (SAGE) can potentially tag and analyze all transcripts in a given cell population or tissue. It has been used to successfully compare expression profiles between normal and cancerous cells, and detect p53 levels prior to apoptosis. In theory, SAGE is an “open”0 system. However, in practice, the short length of the tags means that it is most useful for expression profiling of fully sequenced genes. Thus, the value of this technique might increase as the Human Genome Project progresses.
This method uses two samples that are ligated and tagged with separate primers and then amplified. Subsequently, the primers are removed, revealing sticky ends that form concatemers. The concatemers are both cloned into a vector, with sequence information for the two different cDNA tags contained between anchoring sites. This cloning and sequencing process is time-consuming, as it must be performed for each sample and followed by extensive computational analysis.
The public EST efforts, spearheaded by sequencing work at Washington University (St. Louis, Mo., USA) and the arraying efforts of the IMAGE Consortium (founded by researchers at the Lawrence Livermore National Laboratory, Columbia University, National Institutes of Health and Centre National de la Recherche Scientifique), have made sequences and clones for more than one million cDNA clones publicly available. A network of five distributors across the globe supplies researchers with clones and related research services, such as sets of sequence-verified cDNA clones spotted onto nylon membranes. As standard laboratory protocols can be used and the filters are commercially available at a relatively modest cost, they are a popular forerunner to microarrays. Hybridization of radioactively labeled complex RNA to these membranes yields signals for moderately and abundantly expressed genes and, depending on several factors, some of the less abundant transcripts. Thus, differential expression is best measured using genes that are moderately expressed in at least one of the two (or more) states under study.
DNA microarrays measure expression by using templates containing hundreds or thousands of probes that are exposed simultaneously to a target sample. They make it possible to systematically survey DNA and RNA variation for the first time and are becoming a standard tool for drug discovery and evaluation. Microarray techniques are so powerful that their uses are often limited largely by the challenge of managing and analyzing the data they generate. DNA microarray technology evolved from a paper published in 1975 by E. M. Southern (the originator of the Southern blot), who showed how a solid support could be used to examine nucleic acids. This was advanced by the development of non-porous solid supports, Icading to miniaturization and the use of fluorescence-based detection methods. The two main types of templates are long DNA fragments (over 100 base pairs) and oligonuclotides (generally 1825 mers). Microarrays are expensive, although efficiencies should improve and costs should drop dramatically in the next couple of years, enabling these tools to become accessible to most research laboratories. Besides cost, microarrays are limited by the fact that they can only probe genes for which clones or sequences are already available. Further-more, their accuracy can be limited by the purity of the RNA and the quantity of RNA for each hybridization.
By understanding gene expression patterns, researchers can gain information that can link sites of expression, bio-chemical pathways, and normal or pathological functions in organs and whole organisms. Because of their speed and breadth, microarrays should impact genetic profiling in several ways: Accelerate the understanding of the molecular basis of disease or environmental stresses, Improve knowledge of model systems, Explore pathogens, pathogenic, environmental (microgravity) reactions in terms of gene expression, Pinpoint new molecular level explanations to environmental effects, and Examine efficacy and toxicity responses to environmental or other external simulates.
Microarrays have already determined bow several important genes arm abnormally regulated in disease. For example, a microarray of approximately 100 genes that have a role in inflammation was used to examine rheumatoid tissue. This revealed upregulation of the genes encoding interleukin-6 and several matrix metalloproteinases. In another instance, a novel gene involved in promoting tumors was discovered by using a 1000-element micro-array of unknown cDNAs to examine how treatment with phorbol testers affects expression levels. Microarrays should provide more detailed knowledge about pathogens by systematically examining every gene in a microbe to uncover the overall expression pattern. In addition, microarray will continue to contribute to the understanding of responses to drug treatments. For example, a recent study used microarrays to measure the effects of kinase inhibitors on the entire yeast genome by measuring changes in mRNA levels before and after treatment. In another example, microarray studies of yeast cells showed that the immunosuppressive drug FK506 had the same effect on gene expression level patterns as ablation of the gene that FK506 suppresses. Furthermore, this study showed that, in the absence of this gene, FK506 affected expression levels in other ways. This suggests that the drug might have more than one target. Microarrays are also proving useful in the determination of drug toxicity.
Expression profiling using cDNA microarrays begins by arraying many gene specific amplicons derived from the cDNA clones onto a single matrix. Using two-color hybridization, cDNA representations of total RNA pools are created from test and reference cells, fluorescently tagged with two different colors, then mixed together before being hybridized to the matrix. For each transcript, the resulting fluorescence signals reflect the difference in abundance between the two samples. Two-color hybridizations provide rapid comparisons between the two samples, but they do not measure the absolute levels of gene expression for either sample. By contrast, one-color hybridization is slightly slower, as hybridizations of the two samples must be performed separately to reach meaningful comparisons. However, each one-color hybridization measures absolute levels of gene expression rather than comparative levels. After these actual levels are recorded in databases, they can be compared with levels from other samples without the need to perform comparative experiments. Although performing 1000 two-color hybridizations results in 1000 pair-wise comparisons, conducting 1000 one-color hybridizations yields almost half-a-million pair-wise comparisons, as the absolute values of one-color hybridizations can be evaluated against each other.
Using either the one- or two-color methods, microarray experiments must be performed repeatedly to ensure accuracy of the data. However, computational averaging of the signals of one-color hybridizations from multiple independent samples is more straightforward. The choice between using one-color versus two-color methods depends on several factors, including the number of transcripts under examination, the need for speedy result and cost differences. Hence, one-color hybridizations are often more useful for surveying a large number of genes, while two-color hybridizations can be preferable for more sampling.