1. Field of the Invention
This invention pertains to the field of molecular cytogenetics and more specifically this invention pertains methods, kits and compositions being used to suppress the binding of detectable nucleic acid probes to undesired sequences, such as randomly distributed repeat sequences, in genomic nucleic acid.
2. Background
Nucleic acid hybridization is a fundamental process in molecular biology. Probe-based assays are useful in the detection, identification, quantitation and/or analysis of nucleic acids. Nucleic acid probes have long been used to analyze samples for the presence of nucleic acid from bacteria, fungi, virus or other organisms and are also useful in examining genetically-based disease states or clinical conditions of interest. Nonetheless, nucleic acid probe-based assays have been slow to achieve commercial success. This lack of commercial success is, at least partially, the result of difficulties associated with specificity, sensitivity and/or reliability.
Fluorescence in-situ hybridization (FISH) has become an important tool for determining the number, size and/or location of specific DNA sequences in mammalian cells. Typically, the hybridization reaction fluorescently stains the target sequences so that their location, size and/or number can be determined using fluorescence microscopy, flow cytometry or other suitable instrumentation. DNA sequences ranging from whole genomes down to several kilobases can be studied using current hybridization techniques in combination with commercially available instrumentation.
In Comparative Genomic Hybridization (CGH) whole genomes are stained and compared to normal reference genomes for the detection of regions with aberrant copy number. In the m-FISH technique (multi color FISH) each separate normal chromosome is stained by a separate color (Eils et al, Cytogenetics Cell Genet. 82: 160-71 (1998)). When used on abnormal material, the probes will stain the aberrant chromosomes thereby deducing the normal chromosomes from which they are derived (Macville M et al., Histochem Cell Biol. 108: 299-305 (1997)). Specific DNA sequences, such as the ABL gene, can be reliably stained using probes of only 15 kb (Tkachuk et al., Science 250: 559-62 (1990)). FISH-based staining is sufficiently distinct such that the hybridization signals can be seen both in metaphase spreads and in interphase nuclei. Single and multicolor FISH, using nucleic acid probes, have been applied to different clinical applications generally known as molecular cytogenetics, including prenatal diagnosis, leukemia diagnosis, and tumor cytogenetics.
A large component of the human genome comprises repeat sequences. Heat denaturation and reannealing studies on DNA of higher organisms have distinguished three populations of eukaryotic DNA; a quickly reannealing component representing 25% of total DNA, an intermediate component that represents 30% of the total DNA, and a slow component that represents 45% of the total DNA (Britten et al., Science 161: 529-540 (1968)). Sequence analysis has shown that the slow component is made up by single-copy sequences, which include protein encoding genes, while the fast and intermediate components represents repetitive sequences. The fast component contains small (a few nucleotides long), highly repetitive DNA sequences, which are usually found in tandem while the intermediate component contain the interspersed repetitive DNA (Novick et al., Human Genome Bioscience, 46(1): 32-41 (1996) and Brosius J., Science 251: 753 (1991)). The repetitive units of the intermediate component are interspersed within the genome and is the major reason that large genomic nucleic acid probes (i.e. >100 bp) derived from genomic nucleic acid are not well suited for hybridization analysis.
Interspersed repeated sequences are classified as either SINEs (short interspersed repeated sequences) or LINEs (Kroenberg et al., Cell, 53: 391-400 (1988)). In primates, each of these classes are dominated by a single DNA sequence family, both of which are classified as retrosponos (Rogers J., International Review of Cytology, 93: 187-279 (1985)). The major human SINEs are the Alu-repeat DNA sequence family. The Alu-repeat DNA family members are characterized by a consensus sequence of approximately 280 to 300 bp which consist of two similar sequences arranged as a head to tail dimer. Approximately one million copies of the Alu repeat sequence are estimated to be present per haploid human genome, thereby representing about ten percent of the genome (Ausubel et al., Current Protocols In Molecular Biology, John Wiley & Sons, Inc., 1996)). That estimate is consistent with the recent sequence determination of the human chromosome 21 and 22. These reports demonstrate that Alu repeats cover 9.48% and 16.80% of the DNA, respectively (Hattori et al. Nature, 405: 311-319 (2000) and Dunham I. et al., Nature, 402: 489-495 (1999)).
Alu elements have amplified in the human genome through retroposition over the past 65 million years and have been organized into a wealth of overlapping subfamilies based on diagnostic mutations shared by subfamily members (See For Example: Batzer et al., J. Mol. Evol., 42: 3-6 (1996)). Batzer et al. described a consensus nomenclature for Alu repeats sequences; representing the oldest (J), intermediate (S) and young (Y) family branches. Only the Y family branch is still transcriptional active but it is very small as each of the defined a5, a8 and b8 subfamily members have produced less than 2000 elements (Sherry et al., Genetics, 147: 1977-1982 (1997)). It has been calculated that of the primate Alu repeat family branches, approximately one-fifth belong to the J family and four-fifths to the S family (Britten, R. J., Proc. Natl. Acad. Sci. USA, 91: 6148-6150 (1994). The S family is dominated by the Sx subfamily as it represents more than 50% of the total S family branch.
In addition to SINEs and LINEs, there are several other types of repeats that are known to exist in genomic nucleic acid of humans as well as in other organisms. Chromosome telomeres are repeat sequences that appear to exist only, or else predominately, at the termini of all chromosomes. They are believed to shorten during the life of an organism and may play a role in the aging of an organism (See: Landsorp, P., WIPO Patent Application No. WO97/14026). Likewise, chromosome centromeres contain distinct repeat sequences that exist only, or else predominately, in the central (centromere) region of a chromosome. Certain of the centromere repeat sequences can be detected in all chromosomes of an organism whilst other repeat sequences are unique to a particular chromosome and can be used to identify specific chromosomes (Taneja et al., Genes, Chromosomes & Cancer, 30: 57-63 (2001)).
Telomere and centromere repeat sequences differ from interspersed repetitive sequence, such as SINE and LINEs, in that the telomere and centromere repeat sequences are localized within a certain region of the chromosome. By comparison, SINEs and LINEs, which are referred to herein as randomly distributed repeat sequences, are dispersed randomly throughout the entire genome (Ullu E., TIBS: 216-219 (June, 1982)). Thus, as used herein, the term “randomly distribute repeat sequence” is intended to refer to repeat sequences that occur randomly within all, or essentially all, genomic nucleic acid of an organism. These include, but are not limited to, Alu-repeats, Kpn-repeats, di-nucleotide repeats, tri-nucleotide repeats, tetra-nucleotide repeats, penta-nucleotide repeats, hexa-nucleotide repeats, all of which are more generally classified as SINEs or LINEs.
Detection of specific nucleic acid sequences by in situ hybridization using non-radioactive labels has been applied for almost twenty years. As stated above however, the randomly distributed repeat sequences, such as SINEs and LINEs, are particularly problematic for the production of specific nucleic acid probes that are derived from large clones because the probes will inevitably comprise randomly distributed repeat sequence. The problem arises because the nucleic acid probes will have the randomly distributed repeat sequence contained therein, thereby facilitating hybridization between the randomly distributed repeat sequences of the probes and natural genomic nucleic acid found within all chromosomes. Because the detectable probes hybridize specifically to the target, as well as to repeat sequence that is randomly found in the genomic nucleic acid, there is a high degree of background signal that is produced.
Refinement of non-radioactive detection and visualization methods resulted in improved detection limits and thereby allowed the localization of large single-copy sequences (Landegent et al., Nature, 317: 175-177 (1985)). In this study it was necessary to construct a mixture of seven subclones (a total of 22.3 kb derived from a cosmid DNA clone containing the 3′ end of the Tg gene) in order to eliminate highly repeated sequences present in the original genomic cosmid DNA. Although this was an improvement, a more attractive strategy, based on direct use of large genomic cloned segments in combination with Cot1 DNA, has been described. The use of Cot1 DNA eliminates background signal, caused by highly repetitive sequences, by introducing a competitive hybridization process (Landegent et al., Hum. Genet., 77: 366-370 (1987); U.S. Pat. No. 5,447,841, issued to Gray et al.; and U.S. Pat. No. 6,203,977 B1 issued to Ward et al.).
Cot1 DNA is a heterogeneous mixture of genomic nucleic acid that is prepared by degrading total human DNA and processing the resulting material to thereby select for genomic nucleic acid fragments that are enriched in the repeat sequences (Britten et el., Methods Enzymol 29: 363-418 (1986)). Although the use of Cot1 DNA has been proven to be effective in suppressing undesired binding of detectable nucleic acid fragments of greater that 100 bp to target genomic nucleic acid, there are several disadvantages to this method. One such disadvantage pertains to the preparation of the Cot1 DNA itself. Specifically, because the process relies on the availability of total human DNA, the starting material is itself not highly defined and is likely to vary from sample to sample. Moreover, the processing methods are likely to produce material that varies from batch to batch; this result being somewhat dependent upon the variability of the starting material and somewhat dependent upon the variability of the production process itself. Additionally, the Cot1 DNA is a heterogeneous mixture of fragments that is impossible to completely characterize and define. Hence, the batch to batch variability, as well as the inability to characterize the Cot1 DNA product, leaves substantial room for improvement. The present invention addresses these, as well as other, limitations of the art.
Despite its name, Peptide Nucleic Acid (PNA) is neither a peptide, a nucleic acid nor is it an acid. Peptide Nucleic Acid (PNA) is a non-naturally occurring polyamide (pseudopeptide) that can hybridize to nucleic acid (DNA and RNA) with sequence specificity (See: U.S. Pat. No. 5,539,082 and Egholm et al., Nature 365: 566-568 (1993)). Because they hybridize to nucleic acid with sequence specificity, PNA oligomers have become commonly used in probe based applications for the analysis of nucleic acids.
Being a non-naturally occurring molecule, unmodified PNA is not known to be a substrate for the enzymes that are known to degrade peptides or nucleic acids. Therefore, PNA should be stable in biological samples, as well as have a long shelf-life. Unlike nucleic acid hybridization, which is very dependent on ionic strength, the hybridization of a PNA with a nucleic acid is fairly independent of ionic strength and is favored at low ionic strength, conditions that strongly disfavor the hybridization of nucleic acid to nucleic acid (Egholm et al., Nature, at p. 567). Because of their unique properties, it is clear that PNA is not the equivalent of a nucleic acid in either structure or function.
Labeled PNA probes have been used for the analysis of rRNA in ISH and FISH assays (See: WO95/32305 and WO97/18325). Labeled PNA probes have also been used in the analysis of mRNA (e.g. Kappa & Lambda Light Chain; Thisted M. et al., Cell Vision 3: 358-363 (1996)) and viral nucleic acid (e.g. EBV; Just T et al., J. Vir. Methods: 73: 163-174 (1998)). A labeled PNA probe has also been used to detect human X chromosome specific sequences in a PNA-FISH format (See: WO97/18325, now U.S. Pat. No. 5,888,733). The analysis of chromosome aberrations using PNA probes has also been disclosed (See: WO99/57309). The ISH based analysis of eukaryotic chromosomes and cells, using polyamide nucleic acids, has also been suggested (See: U.S. Pat. No. 5,888,734).
Labeled peptide nucleic acids have been described for the analysis of both telomere and centromere repeat sequences in genomic nucleic acid (Lansdorp, P., WO97/14026). Likewise, labeled peptide nucleic acid oligomers have been used in the analysis of individual human chromosomes in a multiplex PNA-FISH assay (Taneja et al., Genes, Chromosomes & Cancer, 30: 57-63 (2001). Similarly, the analysis of trinucleotide repeats in chromosomal DNA, using appropriate labeled PNA probes, has also been suggested (See: WO97/14026). Subsequently, DNA and PNA probes were used to examine cells for genetic defects associated with the expansion of trinucleotide repeats that manifest as the disease known as human myotonic dystrophy (See: Taneja, Biotechniques, 24: 472-476 (1998)). In all cases, labeled PNA probes were used to detect the specific target nucleic acid repeat sequences.
PNA oligomers comprising the triplet repeat sequence CAG have also been used for the selective isolation of transcriptionally active chromatin restriction fragments (See: Boffa et al., Proc. Nat'l. Acad. Sci. USA, 92: 1901-1905 (1995)).
Peptide nucleic acid oligomers have also been used to suppress the binding of detectable probes to non-target sequences (See: U.S. Pat. No. 6,110,676). Importantly however, there is no specific description, suggestion or teaching of using peptide nucleic acid oligomers to suppress the binding of detectable nucleic acid probes to undesired randomly distributed repeat sequences of genomic nucleic acid.