The present invention relates to improved methods for detecting and mapping genetic abnormalities associated with various diseases. In particular, it relates to the use of nucleic acid hybridization methods for comparing copy numbers of particular nucleic acid sequences in a collection of sequences relative to the copy number of these sequences in other collections of sequences.
Many genomic and genetic studies are directed to the identification of differences in gene dosage or expression among cell populations for the study and detection of disease. For example, many malignancies involve the gain or loss of DNA sequences resulting in activation of oncogenes or inactivation of tumor suppressor genes. Identification of the genetic events leading to neoplastic transformation and subsequent progression can facilitate efforts to define the biological basis for disease, improve prognostication of therapeutic response, and permit earlier tumor detection. In addition, perinatal genetic problems frequently result from loss or gain of chromosome segments such as trisomy 21 or the micro deletion syndromes.
Cytogenetics is the traditional method for detecting amplified or deleted chromosomal regions. More recent methods permit assessing the amount of a given nucleic acid sequence in a sample using molecular techniques. These methods (e.g., Southern blotting) employ cloned DNA or RNA probes that are hybridized to isolated DNA. Southern blotting and related techniques are effective even if the genome is heavily rearranged so as to eliminate useful karyotype information. However, these methods require use of a probe specific for the sequence to be analyzed. Thus, it is necessary to employ very many individual probes, one at a time, to survey the entire genome of each specimen, if no prior information on particular suspect regions of the genome is available.
Comparative genomic hybridization (CGH) is a recent approach to detect the presence and identify the location of amplified or deleted sequences. See, Kallioniemi et al., Science 258: 818-821 (1992) and U.S. Pat. No. 5,665,549). CGH reveals increases and decreases irrespective of genome rearrangement. In one implementation of CGH, genomic DNA is isolated from normal reference cells, as well as from test cells (e.g., tumor cells). The two nucleic acid sequences are differentially labeled and then hybridized in situ to metaphase chromosomes of a reference cell. The repetitive sequences in both the reference and test DNAs are either removed or their hybridization capacity is reduced by some means. Chromosomal regions in the test cells which are at increased or decreased copy number can be quickly identified by detecting regions where the ratio of signal from the two DNAs is altered. For example, those regions that have been decreased in copy number in the test cells will show relatively lower signal from the test DNA than the reference compared to other regions of the genome. Regions that have been increased in copy number in the test cells will show relatively higher signal from the test DNA.
Improved CGH techniques have also been described. For instance, CGH applied to arrays allows for more precise localization of chromosome abnormalities than use of a metaphase spreads as the target (see U.S. Pat. No. 5,830,645).
Despite these improvements, there is a constant need for improved methods of genetic analysis that provide fast, reliable results. The present invention addresses these and other needs.
The present invention provides methods for quantitatively comparing the copy number of a nucleic acid sequence in a first collection of labeled nucleic acid molecules relative to the copy number of that same sequence in a second collection of labeled nucleic acid sequences. The method comprises labeling the nucleic acid molecules in the first collection and the nucleic acid molecules in the second collection with first and second labels, respectively. The first and second labels should be distinguishable from each other. The collections are contacted to a plurality of target oligonucleotides (a microarray) under conditions such that nucleic acid hybridization to the target elements can occur. The two collections can be contacted to the target elements either simultaneously or serially.
The two collections of labeled nucleic acid sequences are prepared by specifically amplifying sequences that hybridize specifically to the target oligonucleotides from source. This amplification produces a representative collection of nucleic acid sequences, meaning that the amplification is both quantitative and results in a collection of reduced complexity. As explained below, a representative collection of nucleic acid sequences is one in which the relative abundance of particular sequences in the source nucleic acids is maintained in the labeled nucleic acids used in the assays of the invention (i.e. is quantitative). In addition, the collection of labeled nucleic acid sequences has much lower complexity as compared to the source nucleic acid molecules. The reduced complexity is advantageous because the rate of hybridization is enhanced, as compared to hybridization using highly complex collections of labeled nucleic acid sequences.
The target oligonucleotides and the labeled nucleic acid sequences may be, for example, RNA, DNA, or cDNA. The nucleic acid sequences may be derived from any organism. Usually the nucleic acid in the target elements and the labeled nucleic acid sequences are from the same species.
The target elements are typically arranged in separate discrete locations on a solid surface. The target oligonucleotides in a target element are those for which comparative copy number information is desired. For example, the oligonucleotides may originate from a chromosomal location known to be associated with disease, may be selected to be representative of a chromosomal region whose association with disease is to be tested, or may correspond to genes whose transcription is to be assayed.
After contacting the labeled nucleic acid sequences to the target elements the amount of binding of each, and the binding ratio is determined for each target element. Typically the greater the ratio of the binding to a target element the greater the copy number ratio of sequences in the two labeled nucleic acid sequences that bind to that element. Thus comparison of the ratios among target elements permits comparison of copy number ratios of different sequences in the labeled nucleic acid sequences.
The methods are typically carried out using techniques suitable for fluorescence in situ hybridization. Thus, the first and second labels are usually fluorescent labels.
In a typical embodiment, one collection of labeled nucleic acid sequences is prepared from a test cell, cell population, or tissue under study; and the second collection of labeled nucleic acid sequences is prepared from a reference cell, cell population, or tissue. Reference cells can be normal non-diseased cells, or they can be from a sample of diseased tissue that serves as a standard for other aspects of the disease. For example, if the reference nucleic acid is genomic DNA isolated from normal cells, then the copy number of each sequence in that collection relative to the others is known (e.g., two copies of each autosomal sequence, and one or two copies of each sex chromosomal sequence depending on gender). Comparison of this to DNA prepared from a test cell permits detection in variations from normal.
Alternatively the reference collection of labeled nucleic acid sequences may be prepared from genomic DNA from a primary tumor which may contain substantial variations in copy number among its different sequences, and the test may be prepared from genomic DNA of metastatic cells from that tumor, so that the comparison shows the differences between the primary tumor and its metastasis. Further, both collections may be prepared from normal cells. For example comparison of mRNA populations between normal cells of different tissues permits detection of differential gene expression that is a critical feature of tissue differentiation. Thus in general the terms test and reference are used for convenience to distinguish the two collections, but they do not imply other characteristics of the nucleic acid sequences they contain.
The invention also provides kits comprising materials useful for carrying out the methods of the invention. Kits of the invention comprise a solid support having an array of target nucleic acid sequences bound thereto and a container containing nucleic acid sequencess representing a normal reference genome, or cDNA from a reference cell type, and the like. The kit may further comprise two different fluorochromes, reagents for labeling the test genomes, alternate reference genomes and the like.
Definitions The term xe2x80x9ccomplexityxe2x80x9d is used here according to standard meaning of this term as established by Britten et al., Methods of Enzymol. 29:363 (1974). See, also Cantor and Schimmel Biophysical Chemistry: Part III at 1228-1230 for further explanation of nucleic acid complexity.
The terms xe2x80x9chybridizing specifically toxe2x80x9d and xe2x80x9cspecific hybridizationxe2x80x9d and xe2x80x9cselectively hybridize to,xe2x80x9d as used herein refer to the binding, duplexing, or hybridizing of a nucleic acid molecule preferentially to a particular nucleotide sequence under stringent conditions. The term xe2x80x9cstringent conditionsxe2x80x9d refers to conditions under which a probe will hybridize preferentially to its target subsequence, and to a lesser extent to, or not at all to, other sequences. A xe2x80x9cstringent hybridizationxe2x80x9d and xe2x80x9cstringent hybridization wash conditionsxe2x80x9d in the context of nucleic acid hybridization (e.g., as in array, Southern or Northern hybridizations) are sequence dependent, and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in, e.g., Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biologyxe2x80x94Hybridization with Nucleic Acid Probes part I, chapt 2, xe2x80x9cOverview of principles of hybridization and the strategy of nucleic acid probe assays,xe2x80x9d Elsevier, N.Y. (xe2x80x9cTijssenxe2x80x9d). Generally, highly stringent hybridization and wash conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on an array or on a filter in a Southern or northern blot is 42xc2x0 C. using standard hybridization solutions (see, e.g., Sambrook (1989) Molecular Cloning: A Laboratory Manual (2nd ed.) VoL 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, N.Y., and detailed discussion, below), with the hybridization being carried out overnight. An example of highly stringent wash conditions is 0.15 M NaCl at 72xc2x0 C. for about 15 minutes. An example of stringent wash conditions is a 0.2xc3x97SSC wash at 65xc2x0 C. for 15 minutes (see, e.g., Sambrook supra.) for a description of SSC buffer). A typical stringent wash for an array hybridization is 50% formamide, 2xc3x97SSC at 35xc2x0 C. to 60xc2x0 C. Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1xc3x97SSC at 45xc2x0 C. for 15 minutes. An example of a low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4xc3x97to 6xc3x97SSC at 40xc2x0 C. for 15 minutes.
The term xe2x80x9clabeled nucleic acid sequencexe2x80x9d, as used herein, refers to a nucleic acid molecule attached to a detectable composition, i.e., a label. The detection can be by, e.g., spectroscopic, photochemical, biochemical, immunochemical, physical or chemical means. For example, useful labels include 32P, 3S, 3H, 14C, 125I, 131I; fluorescent dyes (e.g., FITC, rhodamine, lanthanide phosphors, Texas red), electron-dense reagents (e.g. gold), enzymes, e.g., as commonly used in an ELISA (e.g., horseradish peroxidase, beta-galactosidase, luciferase, alkaline phosphatase), colorimetric labels (e.g. colloidal gold), magnetic labels (e.g. Dynabeads(trademark)), biotin, dioxigenin, or haptens and proteins for which antisera or monoclonal antibodies are available. The label can be directly incorporated into the nucleic acid, peptide or other target compound to be detected, or it can be attached to a probe or antibody that hybridizes or binds to the target. A peptide can be made detectable by incorporating predetermined polypeptide epitopes recognized by a secondary reporter (e.g., leucine zipper pair sequences, binding sites for secondary antibodies, transcriptional activator polypeptide, metal binding domains, epitope tags). Label can be attached by spacer arms of various lengths to reduce potential steric hindrance or impact on other useful or desired properties (see, e.g., Mansfield (1995) Mol Cell Probes 9: 145-156). It will be appreciated that combinations of labels can also be used. Thus, for example, in some embodiments, different nucleic acid sequences may be labeled with distinguishable (e.g. differently colored) labels.
The term xe2x80x9cnucleic acidxe2x80x9d as used herein refers to a deoxyribonucleotide or ribonucleotide in either single- or double-stranded form. The term encompasses nucleic acids, i.e., oligonucleotides, containing known analogues of natural nucleotides which have similar or improved binding properties, for the purposes desired, as the reference nucleic acid. The term also includes nucleic acids which are metabolized in a manner similar to naturally occurring nucleotides or at rates that are improved thereover for the purposes desired. The term also encompasses nucleic-acid-like structures with synthetic backbones. DNA backbone analogues provided by the invention include phosphodiester, phosphorothioate, phosphorodithioate, methylphosphonate, phosphoramidate, alkyl phosphotriester, sulfamate, 3xe2x80x2-thioacetal, methylene(methylimino), 3xe2x80x2-N-carbamate, morpholino carbamate, and peptide nucleic acids (PNAs); see Oligonucleotides and Analogues, a Practical Approach, edited by F. Eckstein, IRL Press at Oxford University Press (1991); Antisense Strategies, Annals of the New York Academy of Sciences, Volume 600, Eds. Baserga and Denhardt (NYAS 1992); Milligan (1993) J. Med. Chem. 36:1923-1937; Antisense Research and Applications (1993, CRC Press). PNAs contain non-ionic backbones, such as N-(2-aminoethyl) glycine units. Phosphorothioate linkages are described in WO 97/03211; WO 96/39154; Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197. Other synthetic backbones encompassed by the term include methyl-phosphonate linkages or alternating methylphosphonate and phosphodiester linkages (Strauss-Soukup (1997) Biochemistry 36: 8692-8698), and benzylphosphonate linkages (Samstag (1996) Antisense Nucleic Acid Drug Dev 6: 153-156). The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide primer, probe and amplification product.
A xe2x80x9cnucleic acid microarrayxe2x80x9d or xe2x80x9cnucleic acid arrayxe2x80x9d is a plurality of target elements, each comprising a target oligonucleotide immobilized on a solid surface to which labeled nucleic acids are hybridized. xe2x80x9cTarget oligonucleotidesxe2x80x9d of a target element are usually between about 10 to about 500 nucleotides, more usually between about 25 to about 250 nucleotides, and typically between about 50 and about 100 nucleotides in length. The oligonucleotides usually have their origin in a defined region of the genome. The target nucleic acids of a target element may, for example, contain sequences from specific genes or, be from a chromosomal region suspected of being present at increased or decreased copy number in cells of interest, e.g., tumor cells. The target element may also be prepared from MRNA, or cDNA derived from such MRNA, suspected of being transcribed at abnormal levels.
Alternatively, a target element may comprise nucleic acid sequences of unknown significance or location. An array of such elements could represent locations that sample, either continuously or at discrete points, any desired portion of a genome, including, but not limited to, an entire genome, a single chromosome, or a portion of a chromosome. The number of target elements and the complexity of the nucleic acids in each would determine the density of sampling. Similarly, an array of targets elements comprising nucleic acids from anonymous cDNA clones (including those containing 5xe2x80x2 untranslated regions or promoter sequences) permits identification of those that might be differentially expressed in some cells of interest, thereby focusing attention on study of these genes.
Generally, smaller target elements are preferred. Typically, a target element will be about 1 mm or less in diameter. Generally element sizes can be from 1xcexcm to about 3 mm, preferably they are between about 5xcexcm and about 1 mm. The target elements of the arrays may be arranged on the solid surface at different densities. The target element densities will depend upon a number of factors, such as the nature of the label, the solid support, and the like. Techniques capable of producing high density arrays can also be used for this purpose (see, e.g., Fodor (1991) Science 767-773; Johnston (1998) Curr. Biol. 8: R171-R174; Schummer (1997) Biotechniques 23: 1087-1092; Kern (1997) Biotechniques 23: 120-124; U.S. Pat. No. 5,143,854).
The term xe2x80x9crelative copy numberxe2x80x9d refers to the number of copies of one nucleic acid molecule or sequence relative to that of another molecule or sequence within a single collection of nucleic acid molecules. The term can also refer to a comparison of the number of copies of the same sequence present in two collections of nucleic acid molecules.
A xe2x80x9crepresentative collection of nucleic acid sequences of reduced complexityxe2x80x9d is a collection of nucleic acid sequences prepared using amplification techniques (e.g. PCR) and labeled as described below. The amplification methods are quantitative so that the relative copy number of particular sequences within a source nucleic acid is maintained in the amplified, labeled nucleic acid sequences used in the assays. In the context of this invention such a collection of labeled nucleic acid sequences is said to be representative of the source from which it is derived. In addition, as a result of the specific amplification of particular sequences, the complexity of the labeled nucleic acid sequences is much less than that of the source. The reduced complexity is advantageous because the hybridization time is shortened as compared to hybridization with more complex mixtures of labeled nucleic acid sequences.
A xe2x80x9csource of nucleic acidxe2x80x9d or xe2x80x9csource nucleic acidxe2x80x9d as used herein is a sample comprising DNA or RNA (typically human) in a form suitable for amplification in the methods of the invention. The nucleic acid may be isolated, cloned or amplified; it may be, e.g., genomic DNA, mRNA, or cDNA from a particular chromosome, or selected sequences (e.g. particular promoters, genes, amplification or restriction fragments, cDNA, etc.) within particular amplicons or deletions known in the art. The nucleic acid sample may be extracted from particular cells or tissues. For example, the cell or tissue sample from which the nucleic acid sample is prepared may be taken from a patient suspected of having cancer associated with the amplicon amplification or deletion or translocation being detected. Methods of isolating cell and tissue samples are well known to those of skill in the art and include, but are not limited to, aspirations, tissue sections, needle biopsies, and the like. Frequently the sample will be a xe2x80x9cclinical samplexe2x80x9d which is a sample derived from a patient, including sections of tissues such as frozen sections or paraffin sections taken for histological purposes. The sample can also be derived from supernatants (of cells) or the cells themselves from cell cultures, cells from tissue culture and other media in which it may be desirable to detect chromosomal abnormalities or determine amplicon copy number.