The ability to clone large, repetitive DNA is an important step toward the development and construction of artificial chromosomes and gene therapy vehicles. In addition, stable cloning of repetitive DNA in microorganisms will be important for generating high resolution physical maps of mammalian chromosomes.
A variety of cloning systems have been developed to facilitate the cloning and propagation of foreign DNA in micro-organisms. Plasmids, bacteriophage, and yeast artificial chromosomes (YACs) have been used successfully to clone many mammalian DNA sequences. However, some types of repetitive DNA appear to be unstable in these vectors (Schalkwyk et al., Curr. Opin. Biotechnol. 6(1):37-43 (1995); Brutlag, D. et al., Cell 10:509-519 (1977)). This results in gaps in physical genomic maps and precludes the use of these vectors as a means of propagating repetitive DNA, and especially highly repetitive mammalian centromeric DNA.
Bacterial Artificial Chromosomes (BACs) have been constructed to allow the cloning of large DNA fragments in E. coli (O'Conner et al., Science 244 (4910):1307-12 (1989); Shizuya et al., Proc. Natl. Acad. Sci. USA 89 (18):8794-7 (1992); Hosoda et al., Nucleic Acids Res. 18(13):3863-9 (1990)). While this system appears to be capable of stably propagating mammalian DNA up to at least 300 kb, relatively few independent mammalian DNA fragments have been analyzed (Shizuya et al., Proc. Natl. Acad. Sci. USA 89 (18):8794-7 (1992)). In addition, the few fragments that have been tested for stability in the BAC vector, have not been extensively characterized with respect to the types of sequences present in each fragment. Thus, it is unknown whether these fragments contain repetitive DNA elements. In particular, it is clear, based on the restriction site and Southern analysis, that these fragments do not contain alpha satellite DNA.
Many mammalian DNA sequences appear stable in Yeast Artificial Chromosome (YAC) vectors, and yet certain repetitive elements of similar length are not (Neil et al., Nucleic Acids Res. 18(6):1421-8 (1990)). Knowledge of DNA properties derived from the YAC system thus suggests that large arrays of repeating units are inherently unstable, even under conditions where similar sized DNA composed of non-repeating DNA is stable.
Thus, the stability of large (greater than 20-100 kb) arrays of repeating units, such as alpha satellite DNA in a BAC vector, prior to the present invention, was not predictable with any reasonable certainty. In addition, even if some alpha satellite arrays were stable in the BAC vector, a priori, it was not clear whether arrays of sufficient size and sequence composition to facilitate centromere function could have been stably propagated in this vector.
The difficulties pertaining to cloning and propagating repeated DNA have been recognized in the literature. In addition, the literature clearly describes a correlation between the number of repeats and plasmid instability. Thus, when considering the relevance of a reference to the invention described and claimed herein, one must take into account the type of DNA repeat, its sequence composition, the size of the repeat unit, and the overall size of the repetitive array.
Hofer et al., Eur. J Biochem. 167:307-313 (1987), describe the cloning of a tandem array of up to 6 copies of a 69 bp repeat. Tandem arrays of 2 and 4 repeats were found to be stable. But arrays of 6 repeats were unstable in the plasmid used in this study. In addition to establishing a correlation between the size of the array and stability, the authors note that "once the hexameric gene has survived the obviously crucial phase of transformation, it is stably replicated in recA, rec BC, and rec+hosts." This indicates that in addition to instability caused by repetitive DNA per se in the plasmid vector, the transformation into E. coli of plasmids containing direct repeats also results in instability of the tandem arrays.
Prior to the inventors' disclosure herein, there were no definitive data regarding plasmid instability resulting from transformation of large tandem arrays of sizes similar to those described in this application. Therefore, it was impossible to predict how unstable such arrays would be during the transformation process. Thus, aside from the lack of evidence pertaining to the feasibility of cloning large arrays into plasmid vectors, the study described in Hofer et al. suggests that regardless of the vector used, it would not have been possible to introduce intact large tandem arrays into E. coli without promoting recombination.
Sinden et al., Genetics 129:991-1005 (1991), discuss the structural instability of plasmids containing indirect repeats. As with direct repeats, this study and others show that there is a correlation between the size of the indirect repeat and the degree of structural instability. Sinden et al. note that "it is difficult to maintain inverted repeats greater than 150 bp in length in plasmid DNA in E. coli" and that "the inability to clone long inverted repeats and the genetic instability associated with inverted repeats have been reported by a large number of investigators."
Thus, these studies show that different classes of repetitive DNA have different stability properties, and that the properties correlate with the size of the repetitive array (and the number of repeat unit copies). In each case, instability was observed with tandem array sizes far smaller than those described in this application.
Leonhardt et al., Gene 103:107-111 (1991), analyze the stability of a plasmid that contains direct and indirect repeats. However, in contrast to the repetitive arrays disclosed and claimed herein, the direct repeats described were small (on the order of 7 bp) and present in only 2 copies. These repeats were not located in tandem on the plasmid. They were separated by intervening plasmid DNA sequences, often several kb in length. As with the direct repeats, the indirect repeats were separated by a large amount of intervening plasmid sequence. Thus, the plasmids used in this study do not provide any indication that large repetitive arrays could be cloned using plasmids.