This invention is directed to compounds that form triple-stranded structures with single-stranded and double-stranded nucleic acids. It is further directed to the use of such compounds to cause strand displacement in double-stranded nucleic acids. The invention further is directed to processes for modifying double-stranded nucleic acid utilizing such strand displacement. Such processes for modifying double-stranded nucleic acids include cleavage of the nucleic acid strand or strands. In particular, such cleavage includes sequence specific cleavage of double-stranded nucleic acids using a nuclease which normally is nonsequence specific. Such process also include transcription inhibition or arrest as well as transcription initiation. The processes of the invention are effected, in particular, with compounds that include naturally-occurring nucleobases or other nucleobase-binding moieties covalently bound to a polyamide backbone.
The function of a gene starts by transcription of its information to a messenger RNA (mRNA) which, by interaction with the ribosomal complex, directs the synthesis of a protein coded for by the mRNA sequence. The synthetic process is known as translation. Translation requires the presence of various co-factors and building blocks, the amino acids, and their transfer RNAs (tRNA), all of which are present in normal cells.
Transcription initiation requires specific recognition of a promoter DNA sequence by the RNA-synthesizing enzyme, RNA polymerase. In many cases in prokaryotic cells, and probably in all cases in eukaryotic cells, this recognition is preceded by sequence-specific binding of a protein transcription factor to the promoter. Other proteins which bind to the promoter, but whose binding prohibits action of RNA polymerase, are known as repressors. Thus, gene activation typically is regulated positively by transcription factors and negatively by repressors.
Most conventional drugs function by interaction with and modulation of one or more targeted endogenous proteins, e.g., enzymes. Typical daily doses of drugs are from 10xe2x88x925-10xe2x88x921 millimoles per kilogram of body weight or 10xe2x88x923-10 millimoles for a 100 kilogram person. If this modulation instead could be effected by interaction with and inactivation of mRNA, a dramatic reduction in the necessary amount of drug necessary likely could be achieved, along with a corresponding reduction in side effects. Further reductions could be effected if such interaction could be rendered site-specific. Given that a functioning gene continually produces mRNA, it would thus be even more advantageous if gene transcription could be arrested in its entirety.
Synthetic reagents that bind sequence selectively to single-stranded and especially to double-stranded nucleic acids are of great interest in molecular biology and medicinal/chemistry, since such reagents may provide the tools for developing gene-targeted drugs and other sequence-specific gene modulators. Until now oligonucleotides and their close analogues have presented the best candidates for such reagents.
Oligodeoxyribonucleotides as long as 100 base pairs (bp) are routinely synthesized by solid phase methods using commercially available, fully automatic synthesis machines. Oligoribonucleotides, however, are much less stable than oligodeoxyribonucleotides, a fact which has contributed to the more prevalent use of oligodeoxyribonucleotides in medical and biological research directed to, for example, gene therapy and the regulation of transcription and translation. Synthetic oligodeoxynucleotides are being investigated for used as antisense probes to block and eventually breakdown mRNA.
It also may be possible to modulate the genome of an animal by, for example, triple helix formation using oligonucleotides or other DNA recognizing agents. However, there are a number of drawbacks associated with oligonucleotide triple helix formation. For example, triple helix formation generally has only been obtained using homopurine sequences and requires unphysiologically high ionic strength and low pH. Whether used as antisense reagents or a triplexing structures, unmodified oligonucleotides are unpractical because they have short in vivo half-lives. To circumvent this, oligonucleotide analogues have been used.
These areas for concern have resulted in an extensive search for improvements and alternatives. For example, the problems arising in connection with double-stranded DNA (dsDNA) recognition through triple helix formation have been diminished by a clever xe2x80x9cswitch backxe2x80x9d chemical linking whereby a sequence of polypurine on one strand is recognized, and by xe2x80x9cswitching backxe2x80x9d, a homopurine sequence on the other strand can be recognized. See, e.g., McCurdy, Moulds, and Froehler, Nucleosides, in press. Also, helix formation has been obtained by using artificial bases, thereby improving binding conditions with regard to ionic strength and pH.
In order to improve half-life as well as membrane penetration, a large number of variations in polynucleotide backbones has been undertaken. These variations include the use of methylphosphonates, monothiophosphates, dithiophosphates, phosphoramidates, phosphate esters, bridged phosphoramidates, bridged phosphorothioates, bridged methylenephosphonates, dephospho internucleotide analogs with siloxane bridges, carbonate bridges, carboxymethyl ester bridges, acetamide bridges, carbamate bridges, thioether, sulfoxy, sulfono bridges, various xe2x80x9cplasticxe2x80x9d DNAs, xcex1-anomeric bridges, and borane derivatives. The great majority of these backbone modifications have decreased the stability of hybrids formed between a modified oligonucleotide and its complementary native oligonucleotide, as assayed by measuring Tm values. Consequently, it is generally believed in the art that backbone modifications destabilize such hybrids, i.e., result in lower Tm values, and should be kept to a minimum.
The discovery of sequence specific endonucleases (restriction enzymes) was an essential step in the development of biotechnology, enabling DNA to be cut at precisely specified locations containing specific base sequences. However, although the range of restriction enzymes now known is extensive, there is still a need to obtain greater flexibility in the ability to recognize particular sequences in double-stranded nucleic acids and to cleave the nucleic acid specifically at or about the recognized sequence.
Most restriction enzymes recognize quartet or sextet DNA sequences and only a very few require octets for recognition. However, restriction enzymes have been identified and isolated only for a small subset of all possible sequences within these constraints. A need exists, especially in connection with the study of large genomic DNA molecules, in general, and with the human genome project, in particular, to recognize and specifically cleave DNA molecules at more rarely occurring sites, e.g., sites defined by about fifteen base pairs.
Efforts have therefore been made to create artificial xe2x80x9crestriction enzymesxe2x80x9dor to modify the procedures for using existing restriction enzymes for this purpose. Methods investigated include the development of oligonucleotides capable of binding sequence specifically via triple helix formation to double-stranded DNA tagged with chemical groups (e.g., photochemical groups) able to cleave DNA or with non-specific DNA cleaving enzymes and other such modifications consistent with an xe2x80x9cAchilles heelxe2x80x9d general strategy. Such methods are described by: Francois, J. C., et al. PNAS 86,9702-9706 (1989); Perrouault, L., et al. Nature 344,358-360 (1990); Strobel, S. A. and Dervan P. B. Science 249,73-75 (1990); Pei, D., Corey D. R. and Schultz P. G. PNAS 87,9858 (1990); Beal, P. A. and Dervan P. B., Science 251,1360 (1991); Hanvey, J. C., Shimizu M. and Wells R. D. NAR 18,157-161 (1990); Koob, H. and Szybalski W. Science 250,271 (1990); Strobel, S. A. and Dervan P. B. Nature #50,172 (1991); and Ferrin, L. J. and Camerini-Otero R. D. Science 254,1494-1497 (1991).
In Patent Cooperation Treaty Applications No. PCT/EP92/01220, and PCT/EP92/01219, both filed on 22nd May 1992, we described certain nucleic acid analogue compounds that have a strong sequence specific DNA binding ability. Examples of such compounds were also disclosed by us in Science 1921, 254 1497-1500. We have shown that a nucleic acid analogue of this type containing 10 bases hybridized to a non-terminal region of a double-stranded DNA and rendered the strand of DNA which was non-complementary to the nucleic acid analogue susceptible to degradation by S1 nuclease. No increased cleavage of the DNA strand complementary to the nucleic acid analogue was seen, so no cleavage of double-stranded DNA was obtained.
It is an object of the invention to provide compounds that bind DNA and RNA strands.
It is a further object of the invention to provide triple x structures between DNA or RNA strands and these compounds.
It is yet another object to provide compounds other than RNA that can bind on strand of a double-stranded polynucleotide, thereby displacing the other strand.
It is still another object to provide therapeutic, diagnostic, and prophylactic methods that employ such compounds.
In the cell, DNA exists as a double stranded structure. During certain cellular events, as for instance transcription or during cell division, portions of the double stranded DNA are transiently denatured to single strand. Further DNA can be isolated outside of a cell and can be purposefully denatured to single stranded DNA. RNA generally exist as a single stranded structure; however, in a local area of secondary structure a RNA, as for instance the stem of a stem loop structure, the RNA can exist as a double stranded structure.
We have found that certain compounds that have nucleobases attached to an aminoethylglycine backbone and other like backbones including polyamides, polythioamides, polysulfinamides and polysulfonamides, which compounds we call peptide nucleic acids or PNA, surprisingly bind strongly and sequence selectively to both RNA and DNA.
We have surprisingly found that these PNA compounds recognize and bind sequence-selectively and strand-selectively to double-stranded DNA (dsDNA). We have found that the binding to double-stranded DNA is accomplished via strand displacement, in which the PNA binds via Watson-Crick binding to its complementary strand and extrudes the other strand in a virtually single-stranded conformation. We have also surprisingly found that these PNA compounds recognize and bind sequence-selectively to single-stranded DNA (ssDNA) and to RNA.
The recognition of PNA to RNA, ssDNA or dsDNA can take place in sequences at least 5 bases long. A more preferred recognition sequence length is 5-60 base pairs long. Sequences between 10 and 20 bases are of particular interest since this is the range within which unique DNA sequences of prokaryotes and eukaryotes are found. Sequences of 17-18 bases are of special interest since this is the length of unique sequences in the human genome.
We have further surprisingly found that the PNA compounds are able to form triple helices with dsDNA. We have found that PNA compounds are able to form triple helices with RNA and ssDNA. The resulting triplexes, e.g., (PNA)2/DNA or (PNA)2/RNA, surprisingly have very high thermal stability. It has been found that the PNA binds with a DNA or RNA in either orientation, i.e., the antiparallel orientation where the amino-terminal of the PNA faces the 3xe2x80x2 end of the nucleic acid or the parallel orientation where the amino-terminal of the PNA faces the 5xe2x80x2 end of the nucleic acid. PNAs are able to form triple helices wherein a first PNA strand binds with RNA or ssDNA and a second PNA strand binds with the resulting double helix or with the first PNA strand.
We further have found that the PNA compounds are able to form triple helices wherein a first PNA strand binds with the ssDNA or RNA or to one of the strands of dsDNA and in doing so displaces the other strand, and a second PNA strand then binds with the resulting double helix. While we do not wish to be bound by theory, it is further believed that other triple helices might be formed wherein a single PNA strand binds to two single stranded nucleic acids strands. In binding with nucleic acids both Watson-Crick and Hoogsteen bind is utilized. It is further believed that PNA might also bind via reverse Hoogsteen binding.
We have further surprisingly found that the PNA compounds form double helices with RNA and ssDNA. Such double helices are hetero duplex structures between the PNA and the respective nucleic acid. Such double helices are preferably helices formed when the PNA strand includes a mixture of both pyrimidine and purines nucleobases.
For therapeutic use of PNA compounds the targets of the PNA compounds would generally be double stranded DNA and RNA. For diagnostic use, investigations methods and reagents where DNA is isolated outside of a cell, the DNA can be denatured to single stranded DNA and use of the PNA compound would be targeted to such single stranded DNA as well as RNA.
PNA compounds useful to effect binding to RNA, ssDNA and dsDNA and to form duplex and triplex complexes are in one sense polymeric strands formed from a polyamide, polythioamide, polysulfinamide or polysulfonamide backbone with a plurality of ligands located at spaced locations along the backbone. At least some of the ligands are capable of hydrogen bonding with other ligands either on the compounds or nucleic acid ligands.
More preferred PNA compounds according to the invention have the formula: 
wherein:
n is at least 2,
each of L1-Ln is independently selected from the group consisting of hydrogen, hydroxy, (C1-C4)alkanoyl, naturally occurring nucleobases, non-naturally occurring nucleobases, aromatic moieties, DNA intercalators, nucleobase-binding groups, heterocyclic moieties, and reporter ligands;
each of C1-Cn is (CR6R7), where R6 is hydrogen and R7 is selected from the group consisting of the side chains of naturally occurring alpha amino acids, or R6 and R7 are independently selected from the group consisting of hydrogen, (C2-C6)alkyl, aryl, aralkyl, heteroaryl, hydroxy, (C1-C6)alkoxy, (C1-C6)alkylthio, NR3R4 and SR5, where R6 and R7 are as defined above, and R5 is hydrogen, (C1-C6)alkyl, hydroxy-, alkoxy-, or alkylthio-substituted (C1-C6)alkyl, or R6 and R7 taken together complete an alicyclic or heterocyclic system;
each of D1-Dn is (CR6R7) where R5 and R7 are as defined above;
each of y and z is zero or an integer from 1 to 10, the
sum y+z being greater than 2 but not more than 10;
each of G1-Gn is xe2x80x94NR3COxe2x80x94, xe2x80x94NR3CSxe2x80x94, xe2x80x94NR3SOxe2x80x94 or xe2x80x94NR3SO2xe2x80x94, in either orientation, where R3 is as defined above;
each of A1-An and B1-Bn are selected such that:
(a) A is a group of formula (IIa), (IIb), (IIc) or
(IId), and B is N or R3N4; or
(b) A is a group of formula (IId) and B is CH; 
where:
X is O, S, Se, NR3, CH2 or C(CH3)2;
Y is a single bond, O, S or NR4;
each of p and q is zero or an integer from 1 to 5, the sum p+q being not more than 10;
each of r and s is zero or an integer from 1 to 5, the sum r+s being not more than 10;
each R1 and R2 is independently selected from the group consisting of hydrogen, (C1-C4)alkyl which may be hydroxy- or alkoxy- or alkylthio-substituted, hydroxy, alkoxy, alkylthio, amino and halogen; and
each R3 and R4 is independently selected from the group consisting of hydrogen, (C1-C6)alkyl, hydroxy- or alkoxy- or alkylthio-substituted (C1-C6)alkyl, hydroxy, alkoxy, alkylthio and amino;
Q is xe2x80x94CO2H, xe2x80x94CONRxe2x80x2Rxe2x80x3, xe2x80x94SO3H or xe2x80x94SO2NRxe2x80x2Rxe2x80x3 or an activated derivative of xe2x80x94CO2H or xe2x80x94SO3H; and
I is xe2x80x94NHRxe2x80x2xe2x80x3Rxe2x80x3xe2x80x3 or xe2x80x94NRxe2x80x2xe2x80x3C(O)RIxe2x80x3xe2x80x3, where Rxe2x80x2, Rxe2x80x3, Rxe2x80x2xe2x80x3 and Rxe2x80x3xe2x80x3 are independently selected from the group consisting of hydrogen, alkyl, amino protecting groups, reporter ligands, intercalators, chelators, peptides, proteins, carbohydrates, lipids, steroids, nucleosides, nucleotides, nucleotide diphosphates, nucleotide triphosphates, oligonucleotides, oligonucleosides and soluble and non-soluble polymers.
In the above structures wherein Rxe2x80x2, Rxe2x80x3, Rxe2x80x2xe2x80x3and Rxe2x80x3xe2x80x3 are oligonucleotides or oligonucleosides, such structures can be considered chimeric structures between PNA compounds and the oligonucleotide or oligonucleoside.
Preferred PNA-containing compounds useful to effect binding to RNA, ssDNA and dsDNA and to form triplexing structure are compounds of the formula III, IV or V: 
where in:
each L independently selected from the group consisting of hydrogen, phenyl, heterocyclic moieties, naturally occurring nucleobases, and non-naturally occurring nucleobases;
each R7xe2x80x2 is independently selected from the group consisting of hydrogen and the side chains of naturally occurring alpha amino acids;
n is an integer greater than 1,
each k, l, and m is, independently, zero or an integer from 1 to 5;
each p is zero or 1;
Rb is OH, NH2 or xe2x80x94NHLysNH2; and
Ri is H or COCH3.
The improved binding of the PNA compounds of the invention with single-stranded RNA and DNA renders them useful as antisense agents. In addition, the binding to double-stranded DNA renders these compounds useful for gene inhibition via various mechanisms. Further, the binding to double-stranded DNA renders these compounds useful as gene activators to initiate transcription.
In one embodiment, the present invention provides methods for inhibiting the expression of particular genes in the cells of an organism, comprising administering to said organism a reagent as defined above which binds specifically to sequences of said genes.
In a further embodiment, the invention provides methods for inhibiting transcription and/or replication of particular genes or for modifying double-stranded DNA as, for instance, by inducing degradation of particular regions of double-stranded DNA in cells of an organism comprising administering to said organism a reagent an defined above.
In a still further embodiment, the invention provides methods for killing cells or virus by contacting said cells or virus with a reagent as defined above which binds specifically to sequences of the genome of said cells or virus.
A novel strategy for sequence-selective cleavage of double-stranded DNA is described. For cases were two closely positioned homo-pyrimidin stretches (of 7-10 bas s and preferably on opposite strands) can be identified, this can be done by synthesizing pairs of PNAs complementary to two parts of this DNA sequence and separated by several base pairs. These PNA molecules are then reacted with the dsDNA and the resulting complex is allowed to react with an endonuclease.
In practicing certain embodiments of the invention, the PNA compounds are able to recognize duplex DNA by displacing one strand, thereby presumably generating a hetero duplex with the other one. Such recognition can take place with dsDNA sequences 5-60 base pairs long. Sequences between 10 and 20 bases are of interest since this is the range within which unique DNA sequences of prokaryotes and eukaryotes are found. Reagents which recognize 17-18 bases are of particular interest since this is the length of unique sequences in the human genome.
The PNA compounds are able to form triple helices with dsDNA, ssDNA or RNA and double helices with RNA or ssDNA. In one embodiment of the invention, the PNA compounds form triple helices wherein a first PNA strand binds with a nucleic acid strand forming a hetero duplex and a second PNA strand then binds with the resulting hetero duplex. In other embodiments of the invention, a PNA compound or a PNA chimera compound forms triple helices wherein a single PNA strand or PNA chimera strand binds with two nucleic acid strands, with a nucleic acid strand and a PNA chimera strand or with two chimera PNA strands.
The invention further provides methods for inhibiting the action of restriction enzymes at restriction sites in nucleic acids. Such methods comprise contacting a nucleic acid with a reagent as defined above under conditions effective to bind such reagent to the nucleic acid proximal to a restriction site.
The invention further provides methods of sequencing DNA by binding the DNA with a reagent as defined above at a site proximal to a restriction site, cleaving the DNA with a restriction enzyme, and identifying the cleaved products.
The invention further provides methods for initiating transcription in cells or organisms comprising administering to the organism a reagent as defined above which initiates transcription of a gene in such cells or organisms.
The invention further provides methods for modulating binding of RNA polymerase to dsDNA by binding the dsDNA with a reagent as defined above that binds with the DNA and then exposing the complex formed thereby to a RNA polymerase.
The invention further provides methods for initiating transcription of a gene by binding the gene with a reagent as defined above that interacts with the gene to melt the double- stranded DNA of the gene and to form a transcription elongation loop.
The invention further provides methods for binding RNA polymerase to dsDNA by contacting the dsDNA with a reagent as defined above that is capable of interacting with the DNA and then exposing the complex formed thereby to a RNA polymerase. More particularly, the interaction is via binding to said dsDNA.
The invention further provides a hybrid complex for modulating transcription wherein the complex comprises dsDNA and a reagent as defined above that binds with the dsDNA, and a RNA transferase.
The invention further provides a synthetic transcription factor comprising a dsDNA and a reagent as defined above that is capable of interaction with the dsDNA. More particularly, the interaction is via binding to said dsDNA.
The invention further provides specific gene activators comprising first and second strands, as defined above, that have specific sequences that bind to selected DNA regions of the gene.
The invention further provides chimeric structures comprising PNAs and DNA or RNA. Such chimeric structures will be used in place of or in addition to a normal PNA strand to effect duplexing, triplexing, nucleic acid binding or protein binding.