Tandem repeat arrays are present throughout the genomes of eukaryotes and play important roles in creating and maintaining of specialized chromatin, e.g., at centromeres and telomeres, and are often associated with heterochromatin (Lee et al., Hum. Genet. 100:291-304, 1997; de Lange, Nat. Rev. Mol. Cell. Biol. 5:323-329, 2004). Small tandem repeat arrays also play a role in gene regulation (Lippman et al., Nature 430:471-476, 2004; Jasinska & Krzyzosiak, FEBS Lett. 567:136-141, 2004; Li et al., Mol. Biol. Evol. 21:991-1007, 2004), and variants have been linked to human disease or disease likelihood (Riley & Krieger, Gene 344:203-211, 2005; Mandola et al., Cancer Res. 63:2898-2904, 2003; Watanabe et al., Am. J. Pathol. 163:633-641, 2003; Everett & Wood, Brain 127:2385-2405, 2004). They also may play a role in rapid evolution (Fondon & Garner, Proc. Natl. Acad. Sci. USA. 101:18058-18063, 2004; Sinha & Siggia, Mol. Biol. Evol. [Epub], Jan. 19, 2005).
Centromeric tandem repeats are associated with the functional kinetochore, the structure that attaches to spindle microtubules for chromosome partitioning to daughter cells. The centromeres of most of the higher eukaryotes that have been studied so far contain tandem repeat arrays of hundreds to thousands of kilobases in size, including centromeres of plants, invertebrates, and vertebrates (Guenatri et al., J. Cell Biol. 166:493-505, 2004; Jiang et al., Trends Plant. Sci. 8:570-575, 2003; Sun et al., Genome Res. 13:182-194, 2003).
Alphoid (alpha-satellite) arrays at human centromeres can extend over many millions of base pairs. Type I arrays are composed of highly homogeneous higher-order repeats (HOR) of 170 bp monomer that are unique to a specific chromosome or shared by a few chromosomes (Lee et al., Hum. Genet. 100:291-304, 1997). Type I arrays are believed to be an important DNA component of a functional centromere. These arrays associate with centromere proteins (such as CENP-A), which closely interact with DNA to form the kinetochore (Ando et al., Mol. Cell. Biol. 22, 2229-2241, 2002; Spence et al., EMBO J. 21:5269-5280, 2002). Moreover, type I arrays are competent to form Human Artificial Chromosomes (HACs) when transformed into human cells (Harrington et al., Nat. Genet. 15:345-355, 1997; Ikeno et al., Nat. Biotechnol. 16:431-439, 1998; Ebersole et al., Hum. Mol. Genet. 9:1623-1631, 2000; Larin & Mejia, Trends Genet. 18:313-319, 2002; Laner et al., Cytogenet. Genome Res. 107:9-13, 2004; Ohzeki et al., J. Cell Biol. 159, 765-775, 2002; Kouprina et al., Nucleic Acids Res. 31:922-934, 2003; Basu et al., Nucleic Acids Res. 33:587-596, 2005; Schueler et al., Science 294:109-115, 2001).
HACs represent extra chromosomes carrying all the required components of a functional kinetochore. HACs have various advantages as gene expression vectors with potential for use in gene therapy. They are stably maintained at a low copy in the host nucleus. They also contain no viral genes or proteins and therefore they should not cause severe immunogenic responses that have been found to be a serious problem with adenoviral vectors. HACs are particularly well suited for carrying intact mammalian genes surrounded by all their long range controlling elements that should confer physiological levels of fully regulated gene expression. Several groups have had success in complementing a genetic deficiency with HACs carrying the full-size gene (e.g., see discussion in Larin & Mejia, Trends Genet. 18:313-319, 2002).
Early HAC formation studies used only a few of the many subfamilies of alphoid DNA arrays that were identified in BAC and YAC libraries. Alphoid arrays with monomers containing the 17 bp CENP-B box from chromosomes 21, X, 17 and 5 cloned into YAC, BAC or PAC vectors have been shown to be competent to form de novo artificial chromosomes in cultured cells, whereas arrays lacking the CENP-B box from the Y chromosome, chromosome 21 type II, and chromosome 22 have proved to be inefficient (Harrington et al., Nat. Genet. 15:345-355, 1997; Ikeno et al., Nat. Biotechnol. 16:431-439, 1998; Ebersole et al., Hum. Mol. Genet. 9:1623-1631, 2000; Larin & Mejia, Trends Genet. 18:313-319, 2002; Laner et al., Cytogenet. Genome Res. 107:9-13, 2004; Ohzeki et al., J. Cell Biol. 159, 765-775, 2002; Kouprina et al., Nucleic Acids Res. 31:922-934, 2003; Basu et al., Nucleic Acids Res. 33:587-596, 2005). Recently, the requirement of the CENP-B box for de novo centromere and HAC assembly was demonstrated using synthetic type I alphoid DNAs containing functional CENP-B boxes or mutant CENP-B boxes, (Ohzeki et al., J. Cell Biol. 159, 765-775, 2002; Basu et al., Nucleic Acids Res. 33:587-596, 2005).
However the presence of the CENP-B box is not sufficient to predict an effective array. X chromosome arrays that contain CENP-B boxes are relatively poor substrates when compared to chromosome 17-derived arrays (Schueler et al., Science 294:109-115, 2001). Substitution of alphoid sequence outside the CENP-B box for GC rich DNA in a synthetically constructed array demonstrated that the CENP-B box alone is not sufficient for centromere nucleation (Ohzeki et al., J. Cell Biol. 159, 765-775, 2002). Although core residues within the 170-base CENP-B box have been identified which are required for efficient CENP-B binding (Muro et al., J. Cell Biol. 116:585-596, 1992; Masumoto et al., J. Cell. Biol. 109:1963-1973, 1989; Masumoto et al., In Chromosome and Aneuploidy (Vig, B K, ed.), pp. 31-43, Springer-Verlag, Berlin, 1993), which bases of the alphoid monomer apart from the CENP-B box are essential for successful centromere nucleation remains unknown. AT richness is found in the centromere repeats of many organisms including human alphoid repeats, but it has yet to be determined if this is a meaningful feature or if specific bases are critical.
Large alphoid tandem repeat DNA segments isolated from genomic libraries are difficult to fully characterize and cannot be modified readily. Therefore, further analysis of alphoid DNA arrays with a defined sequence is required to elucidate the structural requirements for efficient de novo assembly of centromere structure.