The wealth of information generated by the Human Genome Project and other genome projects has spurred research in many traditional disciplines such as cell biology and has given birth to entirely new disciplines such as bioinformatics and proteomics. The functional analysis of the nucleotide information provided by the Human Genome Project will fuel research questions over the next several decades and complete sequence determination of the human genome should be publicly available by 2003. This first step in characterization of the human genome presents tremendous opportunities to understand the function of these genes.
An important extension of the various genome sequencing projects has been the sequencing of short sequences of nucleotides at the 5′ and 3′ ends of cDNA clones and the generation of expressed sequence tag (EST) sequences for comparison with the sequences obtained from genomic DNA (Gill and Sanseau, 2000). The presence of sequences within an EST database demonstrates that some portion of the gene is transcribed into mRNA in a particular cell and at some relative level of abundance. The sequencing of ESTs has provided substantial insight into the tissue specific and pathological regulation of gene expression. For many individual biomedical researchers, the partial characterization of ESTs has greatly facilitated the cloning and expression of genes of interest since many of the ESTs are readily available from public or commercial sources.
A number of techniques currently under development to understand the regulation of gene expression take advantage of the large genomic databases and the availability of ESTs. One such major new technology is the use of DNA microarrays to study regulation of gene transcription by quantifying gene expression (Bittner et al., 1999; Graves, 1999; Watson and Akil, 1999; Brown and Botstein, 1999; Duggan et al., 1999; Young, 2000). In this approach, very small amounts of DNA are applied to the surface of glass microscope slides (Schena et al. (1995) Science 270: 467-470). Typically, the DNA sample is a short PCR-amplified fragment corresponding to a known gene or EST sequence. Approximately 100 nanoliters of DNA solution containing 10 ng of DNA is applied and fixed to the glass slide. The application of DNA can be automated and robotic devices can spot 10,000 individual DNA samples onto a single microscope slide in arrays of easily identifiable patterns. Since the entire process is robotic, it is possible to make tens or hundreds of replicates of such slides. For the analysis of gene expression, the slides are hybridized with fluorescently labeled cDNA derived from mRNA preparations obtained from various samples. After washing, the amount of fluorescent DNA hybridized to the glass slide is indicative of the amount of mRNA complementary to the individual PCR fragment. The fluorescence intensity is quantitated using an array scanner to determine the fluorescence signal at the wavelengths of the fluorophores used to label the cDNA.
This technique has been applied to the characterization of the transcriptional response of 8,600 individual genes in fibroblasts following serum stimulation (Iyer et al., 1999), and to the effect of viral infection, ionizing radiation, and cancer chemotherapeutic agents on transcriptional regulation (Brown and Bottstein, 1999; Zhu, Cong et al., 1998; Amundson, Bittner et al., 1999; Huang, Adelman et al., 1999).
Despite the wealth of information which potentially can be generated using arrayed DNA sequences, the information is limited to detecting the presence of nucleic acid sequences which are already present within a cell. Thus, DNA microarrays are currently used to determine gene expression. Once changes in transcription have been characterized, information about the relevant EST sequences is often limited to searching for homology to other known genes; even if such homology exists, the functionality of proteins encoded by the sequences is not known but can only be inferred. Thus, current methodologies are limited, as they do not provide any insight in the function of a particular gene, particularly those which encode proteins which do not show significant homology to known genes. Essential information for determining protein function, particularly of uncharacterized genes, requires expression of the protein and its characterization. An even greater limitation of the current techniques which employ microarrayed DNA is that major aspects of cellular regulation can not determined using such techniques, since most regulation of cell function occurs by modification of existing protein structure rather than by regulation of gene transcription.
What is needed is the development of a high throughput screening assay for functional characterization of gene products; preferably, such a technique would also take advantage of the advances in DNA microarray technology.