This invention concerns base analogues which may be used to make nucleoside analogues and nucleotide analogues which may be incorporated into nucleic acids and nucleic acid analogues e.g. PNA. Some of these analogues are base-specific and may be incorporated into DNA or RNA or PNA in the place of a single native base i.e. A, T, G, or C. Other analogues have the potential for base-pairing with more than one native base or base analogue.
The present invention provides a compound having the structure 
where Xxe2x95x90O or NH or S
Yxe2x95x90N or CHR6 or CR6 or CO
Wxe2x95x90N or NR6 or CHR6 or CR6 or S or CO
n=1 or 2 or 3
each R6 is independently H or alkyl or alkenyl or alkoxy or aryl or a reporter moiety,
where necessary (i.e. when Y and/or W is N or CR6) a double bond is present between Y and W or W and W, and
Q is H or a sugar or a sugar analogue,
provided that
i) when n is 2 and X is NH and W is CHR6 or CR6, and Y is CO, then at least one reporter moiety is present,
ii) when n is 1 and X is NH and W is N or NR6, then at least one reporter moiety is present,
iii) when n is 1 and X is 0 and Y is CHR6 or CR6 and W is CHR6 or CR6, then at least one R6 is a reporter moiety which is a reactive group or signal moiety or solid surface joined to the remainder of the molecule by a linker of at least 3 chain atoms,
iv) when n is 1 and X is NH and Y is CHR6 or CR6 and W is CHR6 or CR6, then a least one reporter moiety is present,
v) when W is S, then n is 2 and Wn is xe2x80x94CHR6xe2x80x94Sxe2x80x94 or xe2x95x90CR6xe2x80x94Sxe2x80x94,
vi) when n is 2 and X is NH and Y is CHR6 or CR6, then at least one R6 is a reporter moiety which is a reactive group or signal moiety or solid surface joined to the remainder of the molecule by a linker of at least 3 chain atoms.
Q may be 
xe2x80x83where Z is O, S, Se, SO, NR9 or CH2,
R1, R2, R3 and R4 are the same or different and each is H, OH, F, NH2, N3, O-hydrocarbyl or a reporter moiety,
R5 is OH, SH or NH2 or mono-, di- or tri-phosphate or -thiophosphate, or corresponding boranophosphate,
or one of R2 and R5 is a phosphoramidite or other group for incorporation in a polynucleotide chain, or a reporter moiety,
or Q consists of one of the following modified sugar structures 
or Q is a nucleic acid backbone consisting of sugar-phosphate repeats or modified sugar-phosphate repeats (e.g. LNA) (Koshkin et al, 1998, Tetrahedron 54, 3607-30) or a backbone analogue such as peptide or polyamide nucleic acid (PNA). (Nielsen et al, 1991, Science 254, 1497-1500).
When Q is H, these compounds are base analogues. When Q is a sugar or sugar analogue or a modified sugar, these compounds are nucleotide analogues or nucleoside analogues. When Q is a nucleic acid backbone or a backbone analogue, these compounds are herein called nucleic acids or polynucleotides.
Preferred general structures covered by the invention are 
In the context of this invention, a nucleotide is a naturally occurring compound comprising a heterocyclic base and a backbone including a phosphate. A nucleoside is a corresponding compound in which a backbone phosphate may or may not be present. Nucleotide analogues and nucleoside analogues are analogous compounds having different bases and/or different backbones. A nucleoside analogue is a compound which is capable of forming part of a nucleic acid (DNA or RNA or PNA) chain, and is there capable of base-pairing with a base in a complementary chain or base stacking in the appropriate nucleic acid chain. A nucleoside analogue may be specific, by pairing with only one complementary nucleotide; or degenerate, by base pairing with more than one of the natural bases, e.g. with pyrimidines (T/C) or purines (A/G); or universal, by pairing with each of the natural bases with little discrimination; or it may pair with another analogue or itself.
In one preferred aspect of the invention, the base analogue is linked to a sugar moiety such as ribose or deoxyribose to form a nucleoside analogue. When the group R5 is triphosphate, the nucleoside triphosphate analogues of the invention are capable of being incorporated by enzymatic means into nucleic acid chains.
Preferably n is 1 or 2, and W is N or NR6 or CR6 or CHR6.
In another preferred aspect, the nucleoside analogue or nucleotide analogue which contains a base analogue as defined is labelled with at least one reporter moiety. A reporter moiety may be any one of various things. It may be a radioisotope by means of which the nucleoside analogue is rendered easily detectable, for example 32-P or 33-P or 35-S incorporated in a phosphate or thiophosphate or phosphoramidite or H-phosphonate group, or alternatively 3-H or 14-C or an iodine isotope. It may be an isotope detectable by mass spectrometry or NMR. It may be a signal moiety e.g. an enzyme, hapten, fluorophore, chromophore, chemiluminescent group, Raman label or electrochemical label. The reporter moiety may comprise a signal moiety and a linker group joining it to the remainder of the molecule, which linker group may be a chain of up to 30 carbon, nitrogen, oxygen and sulphur atoms, rigid or flexible, unsaturated or saturated as well known in the field. The reporter moiety may comprise a solid surface and a linker group joining it to the rest of the molecule. Linkage to a solid surface enables the use of nucleic acid fragments containing nucleoside analogues to be used in assays including bead based probe assays or assays involving arrays of nucleic acid samples or oligonucleotides which are interrogated with e.g. oligonucleotide or nucleic acid or even peptide or protein probes. The reporter moiety may consist of a linker group with a terminal or other reactive group, e.g. NH2, OH, COOH, CONH2 or SH, by which a signal moiety and/or solid surface may be attached, before or after incorporation of the nucleoside analogue in a nucleic acid chain.
To avoid risk of steric hindrance, a linker preferably has at least three chain atoms, e.g. xe2x80x94(CH2)nxe2x80x94 where n is at least 3.
Two (or more) reporter moieties may be present, e.g. a signal moiety and a solid surface, or a hapten and a different signal moiety, or two fluorescent signal groups to act as donor and acceptor. Various formats of these arrangements may be useful for separation or detection purposes.
Purine and pyrimidine nucleoside derivatives labelled with reporter moieties are well known and well described in the literature. Labelled nucleoside derivatives have the advantage of being readily detectable during sequencing or other molecular biology techniques.
R1, R2, R3 and R4 may each be H, OH, F, NH2, N3, O-alkyl or a reporter moiety. Thus ribonucleosides, and deoxyribonucleosides and dideoxyribonucleosides are envisaged together with other nucleoside analogues. These sugar substituents may contain a reporter moiety in addition to any that might be present on the base.
R5 is OH, SH, NH2 or mono-, di- or tri-phosphate or -thiophosphate or corresponding boranophosphate. Alternatively, one of R2 and R5 may be a phosphoramidite or H-phosphonate or methylphosphonate or phosphorothioate or amide, or an appropriate linkage to a solid surface e.g. hemisuccinate controlled pore glass, or other group for incorporation, generally by chemical means, in a polynucleotide chain. The use of phosphoramidites and related derivatives in synthesising oligonucleotides is well known and described in the literature.
In the new base or nucleoside analogues to which this invention is directed, at least one reporter moiety is preferably present in the base analogue or in the sugar moiety or a phosphate group. Reporter moieties may be introduced into the sugar moiety of a nucleoside analogue by literature methods (e.g. J. Chem. Soc. Chem. Commun. 1990, 1547-8; J. Med. Chem., 1988, 31. 2040-8). Reporters in the form of isotopic labels may be introduced into phosphate groups by literature methods (Analytical Biochemistry, 214, 338-340, 1993; WO 95/15395).
Nucleoside analogues of this invention are useful for labelling DNA or RNA or for incorporating in oligonucleotides or PNA. A reporter moiety is attached at a position where it does not have a significant detrimental effect on the physical or biochemical properties of the nucleoside analogue, in particular its ability to be incorporated in single stranded or double stranded nucleic acid.
A template containing the incorporated nucleoside analogue of this invention may be suitable for copying in nucleic acid synthesis. If a reporter moiety of the incorporated nucleoside analogue consists of a linker group, then a signal moiety can be introduced into the incorporated nucleoside analogue by being attached through a terminal or other reactive group of the linker group.
A nucleoside analogue triphosphate of this invention may be incorporated by enzymes such as terminal transferase to extend the 3xe2x80x2 end of nucleic acid chains in a non-template directed manner. Tails of the nucleoside analogue triphosphate produced in this way may be detected directly in the absence of any reporter label by use of antibodies directed against the nucleoside analogue. The analogues when incorporated into oligonucleotides or nucleic acids may be acted upon by nucleic acid modification enzymes such as ligases or restriction endonucleases.
In primer walking sequencing, a primer/template complex is extended with a polymerase and chain terminated to generate a nested set of fragments where the sequence is read after electrophoresis and detection (radioactive or fluorescent) or directly in a mass spectrometer. A second primer is then synthesised using the sequence information near to the end of the sequence obtained from the first primer. This second (xe2x80x9cwalkingxe2x80x9d) primer is then used for sequencing the same template. Primer walking sequencing is more efficient in terms of generating less redundant sequence information than the alternative xe2x80x9cshot gunxe2x80x9d approach.
The main disadvantage with primer walking is the need to synthesise a primer after each round of sequencing. Cycle sequencing requires primers that have annealing temperatures near to the optimal temperature for the polymerase used for the cycle sequencing. Primers between 18 and 24 residues long are generally used for cycle sequencing. The size of a presynthesised walking primer set required has made primer walking cycle sequencing an impractical proposition. The use of base analogues that are degenerate or universal addresses this problem. The use of such analogues that are also labelled, e.g. the nucleoside analogues of this invention will also help to overcome the problem. Preferred reporters for this purpose are radioactive isotopes or fluorescent groups, such as are used in conventional cycle sequencing reactions. Where the nucleoside analogues are base specific chain terminators they may be used in chain terminating sequencing protocols.
The final analysis step in DNA sequencing involves the use of a denaturing polyacrylamide electrophoresis gel to separate the DNA molecules by size. Electrophoretic separation based solely on size requires the complete elimination of secondary structure from the DNA. For most DNA this is typically accomplished by using high concentrations of urea in the polyacrylamide matrix and running the gels at elevated temperatures. However certain sequences, for example those capable of forming xe2x80x9cstem loopxe2x80x9d structures retain secondary structure and, as a result, display compression artefacts under standard electrophoresis conditions. Here, adjacent bands of the sequence run at nearly the same position on the gel, xe2x80x9ccompressedxe2x80x9d tightly together. Such loops are typically formed when a number of GC pairs are able to interact since GC pairs can form 3 hydrogen bonds compared to the 2 hydrogen bonds of AT pairs.
A second form of compression artefact is seen when rhodamine-labelled terminators are used and there is a G residue close to the terminus. In these cases, anomalous mobility of the DNA strand in a gel is often seen, possibly due to an interaction between the dye and the G residue.
Thus, compression artefacts appear to be caused whenever stable secondary structures exist in the DNA under the conditions prevailing in the gel matrix during electrophoresis. The folded structure runs faster through the gel matrix than an equivalent unfolded DNA.
Currently, gel compression artefacts are eliminated in one of two ways. One is to change to a stronger denaturing condition for the gel, for example 40% formamide with 7 M urea. The other method is to incorporate a derivative of dGTP during the synthesis of DNA. An alternative method would involve the use of a dCTP analogue which reduced the hydrogen bonding potential of the G-C base pair. The nucleoside analogues of this invention may be useful in this regard.
The nucleoside analogues of this invention can also be used in any of the existing applications which use native nucleic acid probes labelled with haptens, fluorophores or other reporter groups, for example on Southern blots, dot blots and in polyacrylamide or agarose gel based methods or solution hybridization assays and other assays in microtitre plates or tubes or assays of oligonucleotides or nucleic acids such as on microchips. The probes may be detected with antibodies targeted either against haptens which are attached to the base analogues or against the base analogues themselves which would be advantageous in avoiding additional chemical modification. Antibodies used in this way are normally labelled with a detectable group such as a fluorophore or an enzyme. Fluorescent detection may also be used if the base analogue itself is fluorescent or if there is a fluorophore attached to the nucleoside analogue.
The use of the different mass of the nucleoside analogue may also be used as a means of detection as well as by the addition of a specific mass tag identifyer to it. Methods for the analysis and detection of specific oligonucleotides, nucleic acid fragments and oligonucleotide primer extension projects based on mass spectrometry have been reported. (Beavis R. C., Chait B. T., U.S. Pat. No. 5,288,644, Wu K. J. et al. Rapid Commun. Mass Spectrom. 7,142 (1993), Koster H. WO 94/16101
These methods are usually based on matrix assisted laser ionisation and desorption, time of flight (MALDITOF) mass spectrometry. They measure the total mass of an oligonucleotide or fragment and from this the sequence of the specific oligonucleotide may be able to be ascertained. In some cases the mass of the oligonucleotide or fragment may not be unique for a specific sequence. This will occur when the ratio of the four natural bases, ACGT, is similar in different sequences.
For example, a simple 4 mer oligonucleotide, ACGT will have the same mass as 24 other possible mers, for example; CAGT, CATG, CGTA, CTAG, CTGA etc.
With longer nucleic acid fragments it may be difficult to resolve the differences in mass between 2 fragments because of a lack of resolution in the spectrum at higher molecular weights. The incorporation of the analogues described here can be used to help identify the specific oligonucleotide or nucleic acid fragment as their masses are different from those of the natural bases.
For example, the two sequences ACGT and CAGT can be identified in the presence of one another by mass spectrometry if, for example one of the natural nucleotides in one of the sequences is replaced with one of the analogues described in this invention. For example, in the oligonucleotide CAGT the T can be replaced by an analogue with little effect on a specific application, for example hybridisation or enzymatic incorporation. Yet the two sequences can be readily identified in the mass spectrometer because of the change in mass due to the introduction of the analogue base.
Not only can mass modifications be made to the bases or linkers but also to the sugars or inter nucleotide linkages. For example thio sugars or phosphorothioate linkages will also result in distinctive mass changes.
A mixture of modifications at the base, linker, nucleoside or nucleotide either separately or together can give rise to a number of molecules with different masses which will be useful to define a specific sequence accurately by its mass, especially in multiplex nucleic acid hybridisation or sequencing applications.
The nucleoside analogues of the present invention with the combination of molecular diversity and increased numbers of positions where reporter groups may be added can result in a series of improved enzyme substrates.
Another preferred aspect of the invention is to incorporate the nucleoside analogue triphosphate into DNA by means of a polymerase but without a reporter label for the purpose of random mutagenesis. It has been shown by Zaccolo et al, 1996, J. Mol. Biol. 255, 589-603 that when nucleotide analogues with ambivalent base pairing potential are incorporated by the PCR into DNA products, they induce the formation of random mutations within the DNA products. In the above publication, the nucleotide analogue dPTP was shown to be incorporated into DNA by Taq polymerase in place of TTP and, with lower efficiency, dCTP. After 30 cycles of DNA amplification, the four transition mutations Axe2x86x92G, Txe2x86x92C, Gxe2x86x92A and Cxe2x86x92T were produced. The compound 8-oxodGTP was also used to cause the formation of the transversion mutations Axe2x86x92C and Txe2x86x92G. The nucleoside analogue triphosphates with ambivalent base pairing potential described within this invention may be used for a similar purpose.
RNA is an extremely versatile biological molecule. Experimental studies by several laboratories have shown that in vitro selection techniques can be employed to isolate short RNA molecules from RNA libraries that bind with high affinity and specificity to proteins, not normally associated with RNA binding, including a few antibodies, (Gold, Allen, Binkley, et al,1993, 497-510 in The RNA World, Cold Spring Harbor Press, Cold Spring Harbor N.Y., Gold, Polisky, Unlenbeck, and Yarus, 1995, Annu. Rev. Biochem. 64: 763-795, Tuerk and Gold, 1990, Science 249:505-510, Joyce, 1989, Gene 82:83-87, Szostak, 1992, Trends Biochem. Sci 17:89-93, Tsai, Kenan and Keene, 1992, PNAS 89:8864-8868, Tsai, Kenan and Keene, 1992, PNAS 89:8864-8868, Doudna, Cech and Sullenger, 1995, PNAS 92:2355-2359). Some of these RNA molecules have been proposed as drug candidates for the treatment of diseases like myasthenia gravis and several other auto-immune diseases.
The basic principle involves adding an RNA library to the protein or molecule of interest. Washing to remove unbound RNA. Then specifically eluting the RNA bound to the protein or other molecule of interest. This eluted RNA is then reverse transcribed and amplified by PCR. The DNA is then transcribed using modified nucleotides (either 2xe2x80x2 modifications to give nuclease resistance e.g. 2xe2x80x2 F, 2xe2x80x2 NH2, 2xe2x80x2 OCH3 and/or C5 modified pyrimidines and/or C8 modified purines). Those molecules that are found to bind the protein or other molecule of interest are cloned and sequenced to look for common (xe2x80x9cconsensusxe2x80x9d) sequences. This sequence is optimised to produce a short oligonucleotide which shows improved specific binding which may then be used as a therapeutic.
The base analogues described here, when converted to the deoxy- or ribonucleoside triphosphate or deoxy- or ribonucleoside phosphoramidite, will significantly increase the molecular diversity available for this selection process. This may lead to oligonucleotides with increased binding affinity to the target that is not available using the current building blocks.
The secondary structure of nucleic acids is also important when considering ribozyme function. The base analogues of the present invention may cause the formation of secondary structures which would otherwise be unavailable using native bases or other modified nucleotide derivatives.
The hybridization binding properties of nucleic acids incorporating base analogues of the present invention may have particular application in the antisense or antigene field.
The base analogues of the present invention may have properties which are different to those of the native bases and therefore are particularly suited to other important applications. In particular, the interaction of these base analogues with enzymes may be extremely important in vivo and may result in the development of new anti-viral therapeutics or therapeutics for non-viral diseases.
A wide range of nucleoside and nucleotide analogues have been developed to form an original class of antiviral agents. Some of these compounds have already been approved by the US FDA for use in the treatment of viral diseases. Examples are compounds like 3xe2x80x2-deoxy-3xe2x80x2-azidothymidine (AZT, Zidovudine) and 2xe2x80x2,3xe2x80x2-dideoxy-3xe2x80x2-thiacytidine (3TC, Lamivudine) for the treatment of HIV infections. Other compounds like (S)-1-(3-hydroxy-2-phosphonylmethoxypropyl)cytosine (HPMPC, cidofovir), 9-(2-phosphonyl-methoxyethyl)adenine (PMEA, adefovir) and (R)-9-(2-phosphonylmethoxypropyl)adenine (PMPA), the acyclic nucleoside phosphonate analogues, are in clinical trials. These compounds either act as absolute DNA chain terminators or result in termination after incorporation of consecutive molecules causing inhibition of the viral DNA polymerase. It should be noted that some of these compounds are the unnatural xcex2-L enantiomers which show significantly decreased interaction with the host DNA polymerases compared to the viral polymerases.
One of the problems in the treatment of viral infections with nucleoside and nucleotide drugs is the ability of the virus to develop resistance by a series of mutations to the viral reverse transcriptase gene that are selected as a result of drug pressure. Therefore, it is often necessary to use combination drug therapies to overcome this problem. However, the number of suitable, available compounds for therapy is limited. The subject of this invention could be useful in expanding the range of nucleoside and nucleotide antiviral drugs.
Those skilled in the art of organic chemistry will recognise that there is a variety of approaches that can be taken to the compounds claimed within the scope of the claims. In addition to those approaches detailed in the experimental section those illustrated below are possible. 
In order to synthesise a compound where W=CR6 and R6 contains a reporter group the reaction sequence in scheme 1 can be used.
Diacetyl protected deoxycytidine an be treated with ethyltrifluorobutynoate to get an enamine intermediate. The enamine is expected to undergo oxidative cyclization under Pd(OAc)2/DMA/70xc2x0 C. reaction conditions (Fukuda et al, Bioorganic and Medicinal Chemistry Letters, 1997, 7, 1683). The ester group thus incorporated can be exploited to conjugate with a fluorescent dye, e.g. after converting to a suitable functional group such as an active ester it can be either reacted directly with an amine containing fluorescent dye or with a with a suitably protected diamine to extend the linker group prior to signal attachment.
When X and W both equal N the following approach in scheme 2 can be undertaken 
Treatment of the known 5-aminocytidine (Kalman and Goldman, BBRC, 1981, 102, 682) with ethylformate leads to the product (Rxe2x95x90H).
For the 8-oxoG analogue (i.e. Rxe2x95x90O) treatment of the initial diamine with a variety of reagents (COCl2, carbonyl diimidazole, diphenyl carbonate) leads to the desired product.
These can be converted to its 5xe2x80x2-triphosphate using methods outlined in the experimental section and its 5xe2x80x2-dimethoxytrityl-3xe2x80x2-phosphoramidite by standard literature methods.
The introduction of a linker (R6) to the 8-position (conventional purine numbering) can be carried out using the reactivity of this position to bromination followed by alkylation with, for example, hexane diamine.
Ion exchange (IE) HPLC was performed on a Waters analytical system running under Millenium Chromatography Manager software. For analytical IE analysis a Amersham Pharmacia Biotech xe2x96xa1RPC ST (C2/C18) reverse phase column (4.6xc3x97100 mm) was used, with (method A) a gradient of 0-25% buffer B over 30 min at a flow rate of 1 mL/min or (method B) a gradient of 0-50% buffer B over 30 min and the same flow rate as method A. Buffer A was 0.1 M TEAB and buffer B was 100% acetonitrile
TLC analysis was performed on 0.2 mm-thick precoated Merck silica gel 60 F254 plates.
Flash silica gel chromatography was performed with 230-400 mesh 60-xc3x85 silica from Merck.
1H NMR spectra (300 MHz) were recorded on a Varian Gemini system