The invention concerns heterocyclic compounds which can be used to label, detect and sequence nucleic acids.
Nucleic acids are of major importance in the living world as carriers and transmitters of genetic information. Since their discovery by F. Miescher they have aroused a wide scientific interest which has led to the elucidation of their function, structure and mechanism of action.
An important tool for explaining these connections and for solving the problems was and is the detection of nucleic acids and namely with regard to their specific detection as well as with regard to their sequence i.e. their primary structure.
The specific detectability of nucleic acids is based on the property of these molecules to interact, i.e. to hybridize, with other nucleic acids to form base pairs by means of hydrogen bridges. Nucleic acids (probes) labelled in a suitable manner, i.e. provided with indicator groups, can thus be used to detect complementary nucleic acids (target).
The determination of the primary structure (sequence), i.e. the sequence of the heterocyclic bases of a nucleic acid, is achieved by means of sequencing techniques. Knowledge of the sequence is in turn a basic requirement for a targeted and specific use of nucleic acids for molecular biological problems and working techniques.
The sequencing also ultimately utilizes the principle of specific hybridization of nucleic acids to one another. As mentioned above labelled nucleic acid fragments are also used for this. Hence a suitable labelling of nucleic acids is an essential prerequisite for any detection method.
At an early period radioactive labelling was mainly used with suitable isotopes such as 32P or 35S. However, the disadvantages of using radioactive reagents are obvious: such work requires special room facilities and licences as well as a controlled and elaborate disposal of the radioactive waste. The reagents for radioactive labelling are expensive. It is not possible to store such labelled probes for long periods due to the short half-life of the above-mentioned nuclides.
Therefore many attempts have been made in recent years to circumvent these serious disadvantages i.e. to get away from using a radioactive label. However, the high sensitivity of this type of label should be retained as far as possible.
Major advances have in fact already been achieved [see e.g. Nonradioactive Labeling and Detection of Biomolecules (Kessler, C., publ.) Springer Verlag Berlin, Heidelberg 1992].
An essential requirement for any detection of a nucleic acid is the prior labelling. As indicated above it is desirable to achieve this in a non-radioactive manner. Whereas radioactive labelling of nucleic acids is usually carried out by the enzymatically catalysed incorporation of appropriate radioactive nucleoside triphosphates, non-radioactive labelling has to be achieved by incorporating a suitable signal or reporter group.
Haptens (such as biotin or digoxigenin), enzymes (such as alkaline phosphatase or peroxidase) or fluorescent dyes (such as fluorescein or rhodamine) have, among others, mainly proven to be suitable as non-radioactive indicator molecules. These signal groups can be attached to or incorporated in nucleic acids by various methods.
A relatively simple procedure is for example to label the 5xe2x80x2 end of an oligonucleotide provided with a terminal amino group by means of activated indicator molecules of the above-mentioned type. However, this only allows the introduction of one or a few indicator molecules into only a low molecular oligomer whereas a denser labelling of longer chain, high molecular nucleic acids with the aim of achieving a high sensitivity usually has to be accomplished by incorporating nucleoside triphosphates provided with reporter groups by means of polymerases as in a de novo synthesis.
Such current methods are known to a person skilled in the art as nick translation [Rigby, P. W. et al., (1977), J.Mol.Biol. 113, 237] and random primed labeling [Feinberg, A. P. and Vogelstein, B. (1984) Anal.Biochem. 137, 266]. A further method is the so-called 3xe2x80x2-tailing reaction with the aid of the enzyme terminal transferase [e.g. Schmitz, G. et al (1991) Anal.Biochem. 192, 222].
The nucleoside triphosphates which have been previously used in these methods are almost exclusively appropriately modified derivatives of the heterocyclic bases adenine, guanine, cytosine and thymine in the deoxyribonucleotide series or adenine, guanine, cytosine and uracil in the ribonucleotide series. Such derivatives are described for example by Langer et al. in Proc.Natl.Acad.Sci. USA 78, 6635 (1981), Mxc3xchlegger et al. Biol.Chem. Hoppe-Seyler 371, 953 (1990) and in EP 0 063 879. In this case the building blocks which occur naturally in DNA and RNA are used in a labelled form i.e. provided with signal groups.
The main disadvantages of these N-nucleosides is that the N-glycosidic bond is sensitive to acidic pH conditions and they can be degraded by nucleases.
Furthermore individual C-nucleosides (see e.g. Suhadolnik, R. J. in xe2x80x9cNucleoside Antibioticsxe2x80x9d, Wiley-Interscience, New York 1970) and their use in the therapeutic (antiviral or cancerostatic) field has also been known for a long time. In addition fluorescent C-nucleoside derivatives and their incorporation into DNA and RNA oligonucleotides has been described (WO 93/16094). The so-called intrinsic fluorescence of these nucleosides is, however, many times lower with regard to quantum yield than that of the special fluorophores such as fluorescein or corresponding rhodamine derivatives. A further disadvantage of the self-fluorescent C-nucleosides is their comparatively low excitation and emission wavelengths. As a result detection systems which are based on such derivatives only have a low sensitivity of detection and on the other hand influences of the measuring environment which interfere spectrally (such as biological material, autofluorescence of gel matrices etc.) have a very major effect. Hence the known nucleosides and nucleoside derivatives have a series of disadvantages which especially have an adverse effect on the detection of nucleic acids. Hence the object of the invention is to provide nucleoside derivatives modified with signal groups for the detection of nucleic acids which do not have the afore-mentioned disadvantages i.e. in particular are more stable and at the same time capable of being processed enzymatically and are suitable for the detection of nucleic acids at a practicable wavelength.
The object is achieved by heterocyclic compounds of the general formula I 
in which
R1 and R2 can be the same or different and represent hydrogen, oxygen, halogen, hydroxy, thio or substituted thio, amino or substituted amino, carboxy, lower alkyl, lower alkenyl, lower alkinyl, aryl, lower alkyloxy, aryloxy, aralkyl, aralkyloxy or a reporter group,
R3 and R4 each represent hydrogen, hydroxy, thio or substituted thio, amino or substituted amino, lower alkyloxy, lower alkenoxy, lower alkinoxy, a protecting group or a reporter group,
R5 represents hydrogen, hydroxy, thio or substituted thio, amino or a substituted amino group, a reactive trivalent or pentavalent phosphorus group such as e.g. a phosphoramidite or H-phosphonate group, an ester or amide residue that can be cleaved in a suitable manner or a reporter group,
R4 and R5 together form a further bond between C-2xe2x80x2 and C-3xe2x80x2 or an acetal group,
R6 represents hydrogen or a hydroxy, thio or substituted thio, amino or substituted amino group,
R7 represents hydrogen, a monophosphate, diphosphate or triphosphate group or the alpha, beta or gamma thiophosphate analogue of this phosphoric acid ester or a protective group as well as possible tautomers and salts thereof.
X denotes methylene or methine substituted with halogen, hydroxy, thio or substituted thio, amino or substituted amino, carboxy, lower alkyl, lower alkenyl, lower alkinyl, aryl, lower alkyloxy, aryloxy, aralkyl, aralkyloxy or a reporter group, or oxygen and n=0 or 1, Z denotes nitrogen or carbon provided that if Z denotes nitrogen, m is zero (0) and if X represents methylene, substituted methylene or substituted methine, Z cannot be carbon and if X denotes oxygen, Z cannot be nitrogen.
All detectable groups come into consideration as a reporter group such as in particular haptens, a fluorophore, a metal-chelating group, a lumiphore, a protein or an intercalator.
Those compounds of the general formula I are preferred in which the acetal group of the residues R4 and R5 is substituted with a reporter group. The reporter group can be bound directly or indirectly i.e. via a linker group.
In addition those compounds of the general formula I have proven to be particularly suitable in which R1 can represent oxygen, R2 can represent hydrogen or a reporter group, R3 and R4 can represent hydrogen, R5 can represent hydroxy, hydrogen, a reactive trivalent or pentavalent phosphorus group, R6 can represent hydrogen and R7 can represent hydrogen, monophosphate, diphosphate or triphosphate groups.
Compounds of the general formula I are also preferred in which the reporter group is bound to the heterocyclic or tetrahydrofuran ring by means of a so-called linker group. Suitable linker groups are known to a person skilled in the art (see e.g. Mxc3xchlegger, K. et al. (1990) Biol.Chem. Hoppe-Seyler 371, 953-965 or Livak, K. J. et al. (1992) Nucl.Acids Res. 20, 4831-4837).
Compounds of the general formula I are additionally preferred in which R1 represents hydrogen, hydroxy, an amino group, an optionally substituted amino group or a reporter group, R2 represents an optionally substituted amino group or a reporter group, R3 represents hydrogen, R4 represents hydrogen, hydroxy, amino or substituted amino, lower alkyloxy, lower alkenoxy, lower alkinoxy, R5 represents hydrogen, hydroxy, thio, an optionally substituted amino group, a phosphoramidite or a reporter group, R4 and R5 together represent an acetal group, R6 represents hydrogen and R7 represents a triphosphate group.
Compounds of formula I are also preferred in which X denotes oxygen and at the same time Z represents carbon substituted with R2 or Z denotes nitrogen and at the same time X represents methylene or methine substituted with amino or substituted amino, carboxy or with a reporter group.
A further preferred embodiment is compounds according to formula I in which X=0 and Z represents methine substituted with amino or substituted amino, carboxy or with a reporter group.
The compounds according to the invention can be synthesized in various ways. In some cases one can start with naturally occurring precursors such as for example 3-(3,4-dihydroxy-5-hydroxymethyl-tetrahydrofuran-2-yl)-pyrrol-2,5-dione or 3-(3,4-dihydroxy-5-hydroxy-methyl-tetrahydrofuran-2-yl)-oxazine-2,6-dione. The important 3-(3-deoxy-4-hydroxy-5-hydroxymethyl-tetrahydrofuran-2-yl) derivatives are synthesized from these precursors by deoxygenation preferably according to Barton (Barton, D. H. R and Motherwell, W. B. (1981) Pure Appl.Chem. 53, 15).
In addition the chemical synthesis of the new heterocyclic compounds can for example be carried out as for example described in detail by K. A. Watanabe in xe2x80x9cChemistry of Nucleosides and Nucleotidesxe2x80x9d 3, 421-535 (L.B. Townsend, publ.) Plenum Press, New York and London, 1994.
Other syntheses of the said starting compounds have for example bean described by Hosmane, R. S. et al. in Bioorg. and Med.Chem.Lett. 3, 2847 (1993) and by Townsend, L. B. et al. in Tetrahedron Lett. 36, 8363 (1995).
The use of the compounds according to the invention to label nucleic acids with diverse, defined signal groups and hence to detect and sequence nucleic acids has proven to be particularly advantageous.
The substances according to the invention of the general formula I have a number of advantages especially compared to the classical nucleosides and nucleotides such as adenosine, guanosine, cytidine, thymidine, uridine etc. and their corresponding phosphoric acid esters.
One advantage is chemical stability i.e. towards acidic pH conditions. A further major advantage is the stability of these compounds towards enzymatic degradation by endonucleases and exonucleases. These enzymes are present in biological material and can severely interfere with the nucleic acid detection. On the other hand it is known that DNA and RNA polymerases are critical with regard to the acceptance of more or less modified nucleoside 5xe2x80x2-triphosphates i.e. with regard to the recognition and incorporation of such nucleotides as substrates in de novo synthesis. Experience has shown that the attachment of signal groups to nucleotides influences in particular their incorporation and incorporation rate.
The fact that the derivatives according to the invention can be incorporated by suitable polymerase into nucleic acids in a very efficient manner such as e.g. by the aforementioned methods of nick translation or of random primed labelling cannot be inferred from the prior art and must therefore be regarded as surprising for a person skilled in the art.
The said methods are used quite generally in nucleic acid detection e.g. for quantitative detection using blotting techniques on membranes or also in microtitre plates.
In sequencing, i.e. detecting the sequence of a nucleic acid, a complementary opposite strand is newly synthesized on the nucleic acid to be sequenced with the aid of a short (start)oligonucleotide (primer) and the addition of labelled nucleoside triphosphates and a polymerase, subsequently so-called termination reactions are carried out and the nucleic acid fragments that are generated in this process are separated by gel chromatography.
In principle the same occurs in the cell in the in situ hybridization to detect certain genes or genome sections i.e. the specific incorporation of labelled nucleotides.
The above-mentioned primers i.e. short-chain oligonucleotides should form stable base pairs with the template strand as well as not be attacked by endogenous nucleases in order to ensure an optimal function.
This is fulfilled by oligonucleotides which contain the compounds according to the invention as building blocks instead of the classical nucleosides.
The same applies to longer chain polynucleotides and nucleic acids which contain such building blocks. These are also a subject matter of the present invention.
Corresponding oligonucleotides and their preparative precursors in the form of so-called phosphoramidites and H-phosphonates are therefore also a subject matter of the invention.
Oligonucleotides are nowadays usually produced by known methods in automated DNA/RNA synthesizers by solid phase synthesis.
Such methods of synthesis are based essentially on the stepwise reaction of the aforementioned phosphoramidites or H-phosphonates and hence the continuous linkage of these monomeric building blocks to form oligomers (see e.g. T. Brown and D. J. S. Brown in Oligonucleotides and Analogues-A Practical Approach, (1991) (Eckstein, F.,publ. IRL Press at Oxford University Press, Oxford, New York, Tokyo).