2.1. Microarray Technology
Although global methods for genomic analysis, such as karyotyping, determination of ploidy, and more recently comparative genomic hybridizaton (CGH) (Feder et al., 1998, Cancer Genet. Cytogenet. 102:25-31; Gebhart et al., 1998, Int. J. Oncol. 12:1151-1155; Larramendy et al., 1997, Am. J. Pathol. 151:1153-1161; Lu et al., 1997, Genes Chromosomes Cancer 20:275-281, all of which are incorporated herein by reference) have provided useful insights into the pathophysiology of cancer and other diseases or conditions with a genetic component, and in some instances have aided diagnosis, prognosis and selection of treatment, current methods do not afford a level of resolution of greater than can be achieved by standard microscopy, or about 5-10 megabases. Moreover, while many particular genes that are prone to mutation can be used as probes to interrogate the genome in very specific ways (Ford et al., 1998, Am. J. Hum. Genet. 62:676-689; Gebhart et al., 1998, Int. J. Oncol. 12:1151-1155; Hacia et al., 1996, Nat. Genet. 14:441-447, all of which are incorporated herein by reference), this one-by-one query is an inefficient and incomplete method for genetically typing cells.
With the advent of microarray, or “chip” technology, it is now clearly possible to contemplate obtaining a high resolution global image of genetic changes in cells. Two general approaches can be conceived. One is to profile the expression pattern of the cell using microarrays of cDNA probes (DeRisi et al., 1996, Nat. Genet. 14:457-460). This method is very likely to yield useful information about cancer, but suffers limitations. First, the interpretation of the data obtained and its correlation with disease process is likely to be a complex and difficult problem: multiple changes in gene expression will be observed that are not relevant to the disease of interest. Second, our present cDNA collections are not complete, and any chip is likely to be obsolete in the near future. Third, while a picture of the current state of the cell might be obtained, there would be little direct information about how the cell arrived at that state. Lastly, obtaining reliable mRNA from biopsies is likely to be a difficult problem, because RNA is very unstable and undergoes rapid degradation due to the presence of ubiquitous RNAses.
The second approach is to examine changes in the cancer genome itself. DNA is more stable than RNA, and can be obtained from poorly handled tissues, and even from fixed and archived biopsies. The genetic changes that occur in the cancer cell, if their cytogenetic location can be sufficiently resolved, can be correlated with known genes as the data bases of positionally mapped cDNAs mature. Thus, the information derived from such an analysis is not likely to become obsolete. The nature and number of genetic changes, can provide clues to the history of the cancer cell. Finally, a high resolution genomic analysis may lead to the discovery of new genes involved in the etiology of the disease or disorder of interest.
Microarrays typically have many different DNA molecules, often referred to as probes, fixed at defined coordinates, or addresses, on a flat, usually glass, support. Each address contains either many copies of a single DNA probe, or a mixture of different DNA probes, and each DNA molecule is usually 2000 nucleotides or less in length. The DNAs can be from many sources, including genomic DNA or cDNA, or can be synthesized oligonucleotides. For clarity and brevity, we refer to those chips with genomic or cDNA derived probes as DNA chips and those chips with synthesized oligonucleotide probes as oligo chips, respectively. Chips are typically hybridized to samples, applied as single stranded nucleic acids in solution.
The extent of hybridization with samples at a given address is determined by many factors including the concentration of complementary sequences in the sample, the probe concentration, and the volume of sample from which each address is able to capture complementary sequences by hybridization. We refer to this volume as the diffusion volume. Because the diffusion volume, and hence, the potential hybridization signal, may vary from address to address in the hybridization chamber, the probe array is most accurate as a comparator, measuring the ratio of hybridization between two differently labeled specimens (the sample) that are thoroughly mixed and therefore share the same hybridization conditions, including the same diffusion volume. Typically the two specimens will be from diseased and disease free cells.
We distinguish between compound and simple DNA probe arrays based on the nucleotide complexity of the probes at each address. When this nucleotide complexity is less than or equal to about 1.2 kb per address, we speak of simple DNA probe arrays. When it exceeds 1.2 kb per address, we speak of compound probe arrays. Simple probe arrays are currently able to detect cDNA species that are present at 2 to 10 copies of mRNA per cell when contacted with a solution containing a total cDNA concentration of 1 mg/ml. The threshold of detection of a given species is estimated to be in the range of 4 to 20 ng/ml. Because a simple probe array is generally able to capture only a single species of DNA from the sample, this detection threshold poses a problem for the use of simple DNA probe arrays for analysis of genomic DNA. The concentration of a unique 700 bp fragment of human genomic DNA (which has a total complexity of about 3000 mb) in a solution of total genomic DNA dissolved at its maximum concentration of 8 mg/ml would be about 2 ng/ml, just below the lower estimate of the threshold of detection. Hence, in its unaltered format, the simple DNA probe chip would not suffice for the robust detection of genomic sequences.
The compound chip partially addresses this problem by increasing the nucleotide complexity of different probes at a given address, allowing for the capture of several species of DNA fragments at a single address. The signals of the different captured species combine to yield a detectable level of hybridization from genomic DNA. Present forms of compound probe arrays place the insert found in a single clone of a megacloning vector, such as a BAC, at each address. Because each address contains fragments derived from the entire BAC clone, several problems are created. The presence of repeat elements in the genomic inserts requires quenching with cold unlabeled DNA. Also, the great size of the megacloning vector inserts limits the positional resolution. For example, in the case of a compound probe array made of BACs, hybridization to a particular address reveals only to which BAC the hybridizing sequence is complementary, and does not reveal the specific complementary gene or sequence within that BAC. Another drawback is the presence of DNA derived from the megacloning vector and host sequences. The steps of excising and purifying the genomic DNA inserts from the vector and host sequences complicate and hinder rapid fabrication of microarrays.
2.2. Problems Associated with Genetic Analysis
Analysis of the genetic changes in human tumors is often problematic because of the presence of normal stroma. Samples of tumor tissue are often contaminated with non-cancerous cells, making isolation and study of tumor cell DNA difficult. While either microdissection or flow cytometry can produce small samples highly enriched for tumor cells or nuclei, the amount of extracted DNA recoverable from such enriched samples is insufficient for most uses.
One technique which can be used on small samples is representational difference analysis (RDA). (U.S. Pat. No. 5,436,142, Lisitsyn et al., 1993, Science 259:946-951) RDA is a subtractive DNA hybridization technique that is useful, e.g., to discover the differences between paired normal and tumor genomes. The first step of RDA requires making an “amplicon representation”, which is a highly reproducible simplification and amplification of a DNA population. Typically, an amplicon representation is a set of restriction endonuclease fragments of a limited size range generated by PCR (polymerase chain reaction). PCR generates sufficient amounts of DNA for subsequent processing, on the order of 100 ug, starting from as little as 3 ng of DNA (the amount of DNA isolatable from about 1000 cells).
One limitation of the amplicon useful in RDA is that an amplicon representation with much lower complexity than that of the genome from which the amplicon is derived is needed to enable the subtractive hybridization to proceed effectively. Such low complexity representations (LCRs) do not “capture” enough (typically, 7% or less) of the genome to be generally useful for other applications. The complexity of the representation is related to the frequency of cutting of the restriction enzyme used to generate the genomic fragments, combined with the amplification reaction steps, e.g., PCR, which tend to favor the smaller fragments.
Whole genome amplification (WGA) is a method by which more complex amplifications of the DNA from minute samples are generated. (Sun et al., 1995, Nucleic Acids Res. 23(15):3034-3040, Barrett et al., 1995, Nucleic Acids Res. 23(17):3488-3492.) In WGA, PCR is performed on DNA isolated from small amounts of sample using random primers.
There are at least three disadvantages to the WGA method:                1. The amplified DNA can not be used for Southern analysis. Because more than one primer can bind to a single gene, a heterogenous mixture of different sized fragments can be generated from a single gene. This would result in a smear, not a band, being detected by Southern hybridization.        2. Due to the random nature of the amplification, each amplification results in a different mixture of fragments. Therefore the amplification is not reliably reproducible. This makes the use of such whole genomic amplifications for the purposes of sample to sample comparisons difficult.        3. Whole genomic amplifications are not useful for quantitating the copy number of genes present in the original sample. Because the primers are random, the representation of each gene can vary greatly with respect to the other genes. Thus, the abundance of each gene relative to other genes in the original sample is not preserved during the amplification, making quantitation of copy number impossible.        
Thus, there continues a long felt need for a method of obtaining amounts of genetic material from scant genomic samples to enable genetic analysis of small samples using techniques which previously were inapplicable due to the limited amount of DNA isolatable from such samples. There is also a long felt need for a method of amplifying and storing DNA from scant, nonrenewable sources.