This invention relates to the fields of molecular biology, biochemistry, and drug design. More particularly, the present invention provides synthetic polyamides containing pyrrole and imidazole amino acids which bind specific base pair sequences of double helical DNA with affinities and specificities comparable to DNA binding proteins such as the transcription factors. A series of molecular templates are described which allow for rational targeting of any predetermined DNA sequence of therapeutic potential. This non-biological approach to DNA recognition provides an underpinning for the design of synthetic cell-permeable ligands for the control of gene-expression.
In every human cell, genetic information is stored on a string-like DNA polymer which is approximately 1 meter in length and contains 3xc3x97109 units of information in the form of base pairs, within which is encoded approximately 80,000 to 100,000 genes or sets of instructions. (Watson, J. D. Gene, 135, 309-315 (1993).) The specific interaction of proteins such as transcription factors with DNA controls the regulation of genes and hence cellular processes. (Roeder, R. G. TIBS, 9, 327-335 (1996).) A wide variety of human conditions ranging from cancer to viral infection arise from malfunctions in the biochemical machinery that regulates gene-expression. (R. Tjian, Sci. Am., 2, 54-61 (1995).) Designed small, molecules which target specific DNA sequences offer a potentially general approach for gene-specific regulation. (Gottesfeld, et al. Nature Accepted. (1997). Such molecules could be powerful therapeutics for combating life threatening diseases which result from misregulation in transcription.
Designed bifunctional small molecules which target specific DNA sequences offer a potentially general approach for gene-specific, sequence-specific, or organism specific modification, detection or capture of plasmids, genes, cDNA, cosmids, or chromosomes. More specifically, a life threatening disease may result from a single error within the 3xc3x97109 units of information stored within the double helix. Sequence-specific polyamides may discriminate such small errors, hence bifunctional polyamides could have broad diagnostic applications which range from determining the molecular basis of life threatening diseases to sequence-specific visualization of disease genes in living organisms.
The genetic information is in fact, stored on two stands of DNA (in antiparallel orientation) in a structure termed the double helix. The DNA double helix consists of A,T and G,C base pairs held together by specific Watson-Crick hydrogen bonds like rungs on a twisted ladder. (Dickerson, et al. Science, 216, 475 (1982). The common B-form of DNA is characterized by a wide (12 xc3x85) and shallow major groove and a deep and narrow (4-6 xc3x85) minor. Individual sequences may be distinguished by the pattern of hydrogen bond donors and acceptors displayed on the edges of the base pairs. (Principles of Nucleic Acid Structure Sanger, W.; Springer-verlag, N.Y., 1984.) In the minor groove, the A,T base pair presents two symmetrically placed hydrogen bond acceptors in the minor groove, the purine N3 and the pyrimidine O2 atoms. The G,C base pair presents these two acceptors, but in addition presents a hydrogen bond donor, the 2-amino group of guanine (Steitz, T. A. Quart. Rev. Biophys. 23, 205).
Small molecules isolated from natural sources which bind DNA are found to be a structurally diverse class, as evidenced by consideration of four representative molecules, chromomycin, distamycin, actinomycin D, and calicheamicin. (Gao, et al. J. Mol. Biol. 223, 259-279. (1992); Kamitori, et al. J. Mol. Biol. 225, 445-456 (1992); Paloma, et al. J. Am. Chem. Soc. 116, 3697-3708 (1994); Coll, et al. Proc. Natl. Acad. Sci. U.S.A. 84, 8385-8389 (1987.)). There is no simple natural recognition code for the readout of specific sequences of DNA.
The structures of four small molecules isolated from natural sources are shown in FIG. 1. Among these DNA-binding molecules, distamycin is distinguished by its structural simplicity, having no chiral centers and an oligopyrrolecarboxamide core structure. (Zimmer, C. Prog. Nucleic Acid Res. Mol. Biol. (1975) 15, 285; Baguley, B. C. Molecular and Cellular Biochemistry (1982) 43, 167-181; Zimmer, et al., Prog. Biophy. Mol. Biol. 47, 31 (1986)). Structural studies of distamycin-DNA complexes reveal modular complexes in which adjacent pyrrolecarboxamides makes similar contacts with adjacent DNA base pairs. The relative simplicity of distamycin, with respect both to its chemical structure and its complexes with DNA, guided the initial decision to use distamycin as a basis for designed polyamides having novel DNA-binding sequence specificity. (Dervan, P. B. Science 232, 464-471 (1986).)
A schematic representation of recognition of A,T rich sequences in the minor groove by Distamycin is shown below: 
Two distinct DNA binding modes exist for Distamycin A. In the first binding mode, a single molecule of Distamycin binds in the middle of the minor groove of a 5 base pair A,T rich sequence. The amide hydrogens of the N-methylpyrrole-carboxamides form bifurcated hydrogen bonds with Adenine N3 and thymine O2 atoms on the floor of the minor groove.10 In the second binding mode, 2 distamycin ligands form an antiparallel side-by-side dimer in the minor groove of a 5 base pair A,T rich site. (Pelton, J. G. and Wemmer, D. E. (1989) Proc. Natl. Acad. Sci. 86, 5723-5727; Pelton, J. G. and Wemmer, D. E. (1990) J. Am. Chem. Soc. 112, 1393-1399; Chen, et al. (1994). Nature Struct. Biol. 1, 169-175) In the 2:1 model each polyamide subunit forms hydrogen bonds to a unique DNA strand in the minor groove.
Polyamides containing N-methylpyrrole (Py) and N-methylimidazole (Im) amino acids provide a model for the design of artificial molecules for recognition of double helical DNA. For side-by-side complexes of Py/Im-polyamides in the minor groove of DNA, the DNA binding sequence specificity depends on the sequence of side-by-side amino acid pairings. (Wade, et al. (1992). J. Am. Chem. Soc. 114, 8783-8794; Mrksich, et al. (1992). Proc. Natl. Acad. Sci. U.S.A. 89, 7586-7590; Wade, W. S., Mrksich, M. and Dervan, P. B. (1993); Biochemistry 32, 11385-11389 (1993)). A pairing of Im opposite Py targets a Gxe2x97xafC base pair while a pairing of Py opposite Im targets a Cxe2x97xafG base pair. A Py/Py combination, is degenerate targeting both Axe2x97xafT and Txe2x97xafA base pairs. Specificity for G,C base pairs results from the formation of a putative hydrogen bond between the imidazole N3 and the exocyclic amine group of guanine. Validity of the pairing rules is supported by a variety of footprinting and NMR structure studies. (Mrksich, et al., J. Am. Chem. Soc., 115, 2572 (1993); Geierstanger, et al. Science, 266, 646 (1994); Mrksich et al., J. Am. Chem. Soc., 117, 3325 (1995).)
A schematic representation of the polyamide pairing rules is shown below:, 
In parallel with the elucidation of the scope and limitations of the pairing rules, efforts have been made to increase the DNA-binding affinity and specificity of pyrrole-imidazole polyamides by covalently linking polyamide subunits. (Mrksich, M. and Dervan, P. B. (1993). J. Am. Chem. Soc. 115, 9892-9899; Dwyer, et al. (1993). J. Am. Chem. Soc. 115, 9900-9906; Mrksich, M. and Dervan, P. B. (1994). J. Am. Chem. Soc. 116, 3663-3664; Chen, Y. H. and Lown, J. W. (1994) J. Am. Chem. Soc. 116, 6995-7005. Chen, Y. H. and Lown, J. W. Heterocycles 41, 1691-1707 (1995). Geierstanger, et al., Nature Structural Biology, 3, 321 (1996). Chen, et al. J. Biomol. Struct. Dyn. 14, 341-355 (1996); Cho, et al. Proc. Natl. Acad. Sci. USA, 92, 10389 (1995)). A simple hairpin polyamide motif with xcex3-aminobutyric acid (xcex3) serving as a turn-specific internal-guide-residue provides a synthetically accessible method of linking polyamide subunits within the 2:1 motif. The head-to-tail linked polyamide ImPyPy-xcex3-PyPyPy-dimethylaminopropylamide (Dp) was shown to specifically bind the designated target site 5xe2x80x2-TGTTA-3xe2x80x2 with an equilibrium association constant of Ka=8xc3x97107 Mxe2x88x921, an increase of 300-fold relative to the unlinked three-ring polyamide pair ImPyPy and PyPyPy. (Mrksich, et al. J. Am. Chem. Soc. 116, 7983-7988). The hairpin polyamide model is supported by footprinting, affinity cleaving and NMR structure studies. (Church, et al. Biochemistry 1990, 29, 6827; He, et al. J. Am. Chem. Soc. 1993, 115, 7061; de Clairac, et al. J. Am. Chem. Soc. submitted.) 
A schematic representation of recognition of a 5xe2x80x2-TGTTA-3xe2x80x2 sequence by unlinked subunits (left) and xcex3-aminobutyric acid linked subunits (right) is shown below: 
Closing the ends of the hairpin to form a cyclic. polyamide increases the overall energetics for DNA-binding presumably by restricting conformational space for the molecule. (Lown, J. W. and Krowicki, K. J. Org. Chem. 1985, 50, 3774.) A cyclic polyamide cyclo-(ImPyPy-xcex3-PyPyPy-xcex3-) was shown to specifically bind the designated target site 5xe2x80x2-TGTTA-3xe2x80x2 with an equilibrium association constant of Ka=2.9xc3x97109 Mxe2x88x921, an increase of 40-fold relative to the corresponding hairpin polyamide of sequence composition ImPyPy-xcex3-PyPyPy. The sequence-specificity versus single base pair mismatch sites drops from 30-fold for the hairpin polyamide to 2-fold for the cyclic polyamide.
A schematic representation of a cyclic polyamide recognizing the minor groove is shown below: 
Despite the design breakthrough in molecular recognition of DNA, the binding affinities of linked and unlinked polyamide dimers of the prior art are modest when compared to those found with natural DNA binding proteins. (Clemens, et al. J. Mol. Biol. 244, 23-35 (1994)) For example DNA-binding transcription factors recognize their cognate sites at subnanomolar concentrations. (Jamieson, et al. Biochemistry 33, 5689-5695 (1994); Choo, Y. and Klug, A. Proc. Natl. Acad. Sci. U.S.A. 91, 11168-11172 (1994); Greisman, H. A. and Pabo, C. O. Science 275, 657-661 (1997)). Six-ring hairpin polyamides require concentrations greater than 10 nM to occupy their target sites. The only class of polyamides described in the prior art with affinities similar to DNA-binding proteins are the 6-ring cyclic polyamides; however, this class of molecules lacks the sequence-specificity of proteins (i.e. an energetic penalty for binding a single base pair mismatch site) and therefore currently has no potential for therapeutic applications.
Two prior approaches for the development of synthetic transcriptional antagonists have been reported. Oligodeoxynucleotides which recognize the major groove of double helical DNA via triple helix formation bind a broad sequence repertoire with high affinity and specificity (Moser, H. E. and Dervan, P. B. Science 238, 645-650. (1987); Thuong, et al. Angew. Chem. Int. Ed. Engl. 32, 666-690 (1993)). Although oligonucleotides and their analogs have been shown to interfere with gene expression (Maher, et al. Biochemistry 31, 70-81 (1992); Duvalvalentin, et al. Proc. Natl. Acad. Sci. U.S.A. 89, 504-508 (1992)). The triple helix approach is limited to purine tracks and suffers from poor cellular uptake. There are a few examples of cell-permeable carbohydrate based ligands that interfere with transcription factor function. (Ho, et al. Proc. Natl. Acad. Sci. USA 91, 9203-9207 (1994); Liu, C. et al. Proc. Natl. Acad. Sci. USA 93, 940-944 (1996)). However oligosaccharides are not yet amenable to recognition of a broad range of predetermined DNA sequences.
Because of the small size and hydrophobic nature of polyamides (MW≈1200) and because the parent ligand Distamycin is itself cell-permeable these ligands have the potential to underpin a new field of small molecule regulation of gene expression. It remained to be determined if low molecular weight (MW≈1200) pyrrole-imidazole polyamides could be constructed which recognize predetermined DNA sites at subnanomolar concentrations without compromising sequence-selectivity.
This invention provides improved polyamides for selectively binding a DNA molecule. Compounds of the present invention comprise a polyamide of the formula: 
where R1, Ra, Rb, Re, Rf, Ri, Rj, Rn, and Ro are chosen independently from H, Cl, NO, N-acetyl, benzyl, C1-6 alkyl, C1-6 alkylamine, C1-6 alkyldiamine, C1-6 alkylcarboxylate, C1-6 alkenyl, and C1-6 alkynyl;
R2 is selected from the group consisting of H, NH2, SH, Cl, Br, F, N-acetyl, and N-formyl;
R3, Rd, R1 and Rq are selected independently from the group consisting of H, NH2, OH, SH, Br, Cl, F, OMe, CH2OH, CH2SH, CH2NH2;
R4 is xe2x80x94NH(CH2)0-6NR5R6 or NH(CH2)rCO NH(CH2)0-6NR5R6 or NHR5 or NH(CH2)rCONHR5, where R5 and R6 are independently chosen from H, Cl, NO, N-acetyl, benzyl, C1-6 alkyl, C1-6 alkylamine, C1-6 alkyldiamine, C1-6 alkylcarboxylate, C1-6 alkenyl, C1-6, where L groups are independently chosen from biotin, oligodeoxynucleotide, N-ethylnitrosourea, fluorescein, bromoacetamide, iodoacetamide, DL-xcex1-lipoic acid, acridine, ethyl red, 4-(psoralen-8-yloxy)-butyrate, tartaric acid, (+)-xcex1-tocopheral, and C1-6 alkynyl, where r is an integer having a value ranging from 0 to 6;
X, Xa, Xb, Xe, Xf, Xi, Xj, Xn, Xo are chosen independently from the group consisting of N, CH, COH, CCH3, CNH2, CCl, CF; and
a, b, c, d, e, f, i, j, k, and m are integers chosen independently, having values ranging from 0 to 5; or a pharmaceutically acceptable salt thereof.
The invention further comprises a polyamide having the formula: 
where R1, Ra(i,m) and Rb(j,m) are chosen independently from H, Cl, NO, N-acetyl, benzyl, C1-6 alkyl, C1-6 alkylamine, C1-6 alkyldiamine, C1-6 alkylcarboxylate, C1-6 alkenyl, and C1-6 alkynyl;
R2 is selected from the group consisting of H, NH2, SH, Cl, Br, F, N-acetyl, and N-formyl;
Rf(m) and Rc(k,m) are selected independently from the group consisting of H, NH2, OH, SH, Br, Cl, F, OMe, CH2OH, CH2SH, CH2NH2;
R4 is xe2x80x94NH(CH2)0-6NR5R6 or NH(CH2)rCO NH(CH2)0-6NR5R6 or NHR5 or NH(CH2)rCONHR5, where R5 and R6 are independently chosen from H, Cl, NO, N-acetyl, benzyl, C1-6 alkyl, C1-6 alkylamine, C1-6 alkyldiamine, C1-6 alkylcarboxylate, C1-6 alkenyl, C1-6L, where L groups are independently chosen from biotin, oligodeoxynucleotide, N-ethylnitrosourea, fluorescein, bromoacetamide, iodoacetamide, DL-xcex1-lipoic acid, acridine, ethyl red, 4-(psoralen-8-yloxy)-butyrate, tartaric acid, (+)-xcex1-tocopheral, and C1-6 alkynyl, where r is an integer having a value ranging from 0 to 6;
X, Xa(i,m) and Xb(j,m) are chosen independently from the group consisting of N, CH, COH, CCH3, CNH2, CCl, CF; and
a, b, c, d, e, f, g, h, i, j, k, l, m, n, o and p are integers chosen independently, having values ranging from 0 to 5;
or a pharmaceutically acceptable salt thereof.
By xe2x80x9calkylxe2x80x9d or xe2x80x9clower alkylxe2x80x9d in the present invention is meant C1-C6 alkyl, i.e., straight or branched chain alkyl groups having 1-6 carbon atoms, such as, for example, methyl, ethyl, propyl, isopropyl, n-butyl, sec-butyl, tert-butyl, pentyl, 2-pentyl, isopentyl, neopentyl, hexyl, 2-hexyl, 3-hexyl, and 3-methylpentyl. Preferred C1-C6 alkyl groups are methyl, ethyl, propyl, butyl, cyclopropyl or cyclopropylmethyl. Particularly preferred are C1-C alkyl groups such as methyl, ethyl, and propyl.