This invention concerns base analogues which may be used to make nucleoside analogues and nucleotide analogues which may be incorporated into nucleic acids. Some of these analogues are base-specific and may be incorporated into DNA or RNA in the place of a single native base i.e. A, T, G, or C. Other analogues have the potential for base-pairing with more than one native base or base analogue.
The 2xe2x80x2-deoxyribosides of such analogues as (1) (P: Kong Thoo Lin and Brown, 1989, Nucleic Acids Research, 17, 10373-83), N6-methoxyadenine and N6-methoxy-2,6-diaminopurine have been shown to be extremely useful in mixed sequence oligonucleotide primers used in PCR and DNA sequencing. In addition, the base pairing characteristics of the 2xe2x80x2-deoxynucleoside 5xe2x80x2-triphosphate of P have been exploited in a PCR-based random mutagenesis procedure (Zaccolo et al, 1996, J. Mol. Biol., 255, 589-603): 
Synthesis of a nucleoside with a tricyclic base analogue where the third ring is between the 6 and 7 positions (purine nomenclature) was described by Schram and Townsend (1971) Tetrahedron Lett 49,. 4757-4760. However, this has a blocked hydrogen bonding face and would not be expected to participate in base pairing observed in native nucleic acids or, as a triphosphate, in polymerase mediated incorporation reactions.
The present invention provides a compound having the structure(2) 
where
W is an alkylene or alkenylene chain of 0-5 carbon atoms any of which may carry a substituent R8,
X is O or N or NR12 or CR10,
Xxe2x80x2 is O or S or N,
provided that when Xxe2x80x2 is O or S, X is C,
Y is CH or N,
R6 is H or NH2 or SMe or SO2Me or NHNH2,
each of R7 and R8 is independently H or F or alkyl or alkenyl or aryl or acyl or a reporter moiety,
each of R9 and R12 is independently H or alkyl or alkenyl or aryl or acyl or a reporter moiety,
R10 is H or xe2x95x90O or F or alkyl or alkenyl or aryl or acyl or a reporter moiety,
the dotted line indicates one optional double bond,
Q is H or a sugar moiety or a sugar analogue including but not limited to the structure 
xe2x80x83where
Z is O, S, Se, SO, NR9 or CH2,
R1, R2, R3 and R4 are the same or different and each is H, OH, F, NH2, N3, O-hydrocarbyl or a reporter moiety,
R5 is OH, SH or NH2 or mono-, di- or tri-phosphate or -thiophosphate, or corresponding boranophosphate,
or one of R2 and R5 is a phosphoramidite or other group for incorporation in a polynucleotide chain, or a reporter moiety,
or Q consists of one of the following modified sugar structures 
xe2x80x83or Q is a nucleic acid backbone consisting of sugar-phosphate repeats or modified sugar-phosphate repeats (e.g. LNA) (Koshkin et al, (1998), Tetrahedron 54, 3607-30), or a backbone analogue such as peptide or polyamide nucleic acid (PNA). (Nielsen et al, 1991, Science 254, 1497-1500).
Preferably:
W=CH2 
X=O
Xxe2x80x2=N
Y=CH or N
R6=H or NH2 
R7=H or reporter moiety.
The dotted line in the structure (2) shows that either xe2x80x94Xxe2x80x94CHR7xe2x80x94Wxe2x80x94 or xe2x80x94Xxe2x95x90CR7xe2x80x94Wxe2x80x94 or xe2x80x94Xxe2x80x94CR7xe2x95x90Wxe2x80x94 is present. Of course, when X is O the second of these structures is not possible.
Depending on the identity of W, the ring containing Xxe2x80x2 and X contains from 6 to 11 members.
Alkyl, alkenyl, aryl and acyl groups herein preferably contain 1-20 carbon atoms.
Any unfilled valencies are to be understood as being filled by H.
The 2-deoxynucleoside analogue termed as P (Kong, Thoo Lin and Brown, 1989, Nucleic Acids Research 17, 10373-83) can exist in two tautomeric forms as indicated below: 
Base analogues of this invention where Xxe2x80x2=N and X=O or N or CR10 also exhibit similar types of tautomerism, e.g. 
The exact ratio of these tautomers can be affected by the X substituent. The changes in tautomeric ratio can have subtle effects on the hybridisation properties of the analogue. For convention only one tautomer is drawn within this patent though it is implicit that both can be present.
When Q is H, these compounds are base analogues. When Q is a sugar moiety or sugar analogue or a modified sugar, these compounds are nucleotide analogues or nucleoside analogues. When Q is a nucleic acid backbone or a backbone analogue, these compounds are herein called nucleic acids or polynucleotides.
In the context of the this invention, a nucleotide is a naturally occurring compound comprising a base and a sugar backbone including a phosphate. A nucleoside is a corresponding compound in which a backbone phosphate may or may not be present. Nucleotide analogues and nucleoside analogues are analogous compounds having different bases and/or different backbones. A nucleoside analogue is a compound which is capable of forming part of a nucleic acid (DNA or RNA) chain, and is there capable of base-pairing with a base in a complementary chain or base stacking in the appropriate nucleic acid chain. A nucleoside analogue may be specific, by pairing with only one complementary nucleotide; or degenerate, by base pairing with two or three of the natural bases, e.g. with pyrimidines (T/C) or purines (A/G); or universal, by pairing with each of the natural bases without discrimination; or it may pair with another analogue or itself.
In one preferred aspect of the invention, the base analogue is linked to a sugar moiety such as ribose or deoxyribose to form a nucleoside analogue. When the group R5 is triphosphate, the nucleoside triphosphate analogues of the invention are capable of being incorporated by enzymatic means into nucleic acid chains.
In another preferred aspect, the nucleoside analogue or nucleotide analogue which contains a base analogue as defined is labelled with at least one reporter moiety. A reporter moiety may be any one of various things. It may be a radioisotope by means of which the nucleoside analogue is rendered easily detectable, for example 32-P or 33-P or 35-S incorporated in a phosphate or thiophosphate or phosphoramidite or H-phosphonate group, or alternatively 3-H or 14-C or an Iodine isotope. It may be an isotope detectable by mass spectrometry or NMR. It may be a signal moiety e.g. an enzyme, hapten, fluorophore, chromophore, chemiluminescent group, Raman label, electrochemical label or signal compound adapted for detection by mass spectrometry. The reporter moiety may comprise a signal moiety and a linker group joining it to the remainder of the molecule, which linker group may be a chain of up to 30 carbon, nitrogen, oxygen and sulphur atoms, rigid or flexible, unsaturated or saturated as well known in the field. The reporter moiety may comprise a solid surface and a linker group joining it to the rest of the molecule. Linkage to a solid surface enables the use of nucleic acid fragments containing nucleoside analogues to be used in assays including bead based probe assays or assays involving arrays of nucleic acid samples or oligonucleotides which are interrogated with e.g. oligonucleotide or nucleic acid probes. The reporter moiety may consist of a linker group with a terminal or other reactive group, e.g. NH2, OH, COOH, CONH2 or SH, by which a signal moiety and/or solid surface may be attached, before or after incorporation of the nucleoside analogue in a nucleic acid chain, before or after hybridisation.
Two (or more) reporter groups may be present, e.g. a signal group and a solid surface, or a hapten and a different signal group, or two fluorescent signal groups to act as donor and acceptor. Various formats of these arrangements may be useful for separation purposes.
Purine and pyrimidine nucleoside derivatives labelled with reporter moieties are well known and well described in the literature. Labelled nucleoside derivatives have the advantage of being readily detectable during sequencing or other molecular biology techniques.
R1, R2, R3 and R4 may each be H, OH, F, NH2, N3, O-alkyl or a reporter moiety. Thus ribonucleosides, and deoxyribonucleosides and dideoxyribonucleosides are envisaged together with other nucleoside analogues. These sugar substituents may contain a reporter moiety in addition to one or two present in the base.
R5 is OH, SH, NH2 or mono-, di- or tri-phosphate or -thiophosphate or corresponding boranophosphate. Alternatively, one of R2 and R5 may be a phosphoramidite or H-phosphonate or methylphosphonate or phosphorothioate, or an appropriate linkage to a solid surface e.g. hemisuccinate controlled pore glass, or other group for incorporation, generally by chemical means, in a polynucleotide chain. The use of phosphoramidites and related derivatives in synthesising oligonucleotides is well known and described in the literature.
In the new base or nucleoside analogues to which this invention is directed, at least one reporter moiety is preferably present in the base analogue or in the sugar moiety or a phosphate group. Reporter moieties may be introduced into the sugar moiety of a nucleoside analogue by literature methods (e.g. J. Chem. Soc. Chem. Commun. 1990, 1547-8; J. Med. Chem., 1988, 31. 2040-8). Reporters in the form of isotopic labels may be introduced into phosphate groups by literature methods (Analytical Biochemistry, 214, 338-340, 1993; WO 95/15395).
Examples within this specification have shown how analogues where W is one carbon atom can readily be synthesised and how reactive groups and signal moieties can be included as required. It has been found that two basic approaches to tricyclic formation can be undertaken. A leaving group can be generated at the 6 position (purine nomenclature) of the precursor bicyclic base heterocycle and subsequently displaced by an incoming nucleophile species attached to a side arm such as an hydroxylamine derivative in example 1.10. This approach was found to be not applicable under the conditions tried when W=1 and Y=N. Instead, the synthetic approach of initially displacing a leaving group in the 6 position (purine nomenclature) with hydroxylamine (Example 4.8) and then effecting ring closure by reaction of this with triisopropylbenzenesulphonyl chloride and displacement with an alkoxide derivative was found to be effective (Example 4.11). A similar strategy was applied in making 5-xcex2-D-ribofuranosyl-3H,5H,7H-pyrimido[4,5-c][1,2]oxazol-6-one, a bicyclic base analogue (Loakes and Brown, 1994, Nucleosides and Nucleotides 13, 679-706). By such a strategy the previously unmade compound W=zero carbons Y=CH or N could then be prepared.
Nucleoside analogues of this invention are useful for labelling DNA or RNA or for incorporating in oligonucleotides. Some have the possible advantage over conventional hapten labelled nucleotides such as fluorescein-dUTP of being able to replace more than one base. A reporter moiety is attached at a position where it does not have a significant a detrimental effect on the physical or biochemical properties of the nucleoside analogue, in particular its ability to be incorporated in single stranded or double stranded nucleic acid.
A template containing the incorporated nucleoside analogue of this invention may be suitable for copying in nucleic acid synthesis. If a reporter moiety of the incorporated nucleoside analogue consists of a linker group, then a signal moiety can be introduced into the incorporated nucleoside analogue by being attached through a terminal or other reactive group of the linker group.
A nucleoside analogue triphosphate of this invention may be incorporated by enzymes such as terminal transferase to extend the 3xe2x80x2 end of nucleic acid chains in a non-template directed manner. Tails of the nucleoside analogue triphosphate produced in this way may be detected directly in the absence of any reporter label by use of antibodies directed against the nucleoside analogue (as described in Example 13 of WO 97/28177). The analogues when incorporated into oligonucleotides or nucleic acids may be acted upon by nucleic acid modification enzymes such as ligases or restriction endonucleases.
In primer walking sequencing, a primer/template complex is extended with a polymerase and chain terminated to generate a nested set of fragments where the sequence is read after electrophoresis and detection (radioactive or fluorescent). A second primer is then synthesised using the sequence information near to the end of the sequence obtained from the first primer. This second (xe2x80x9cwalkingxe2x80x9d) primer is then used for sequencing the same template. Primer walking sequencing is more efficient in terms of generating less redundant sequence information than the alternative xe2x80x9cshot gunxe2x80x9d approach.
The main disadvantage with primer walking is the need to synthesise a primer after each round of sequencing. Cycle sequencing requires primers that have annealing temperatures near to the optimal temperature for the polymerase used for the cycle sequencing. Primers between 18 and 24 residues long are generally used for cycle sequencing. The size of a presynthesised walking primer set required has made primer walking cycle sequencing an impractical proposition. The use of base analogues that are degenerate or universal addresses this problem. The use of such analogues that are also labelled, e.g. the nucleoside analogues of this invention will also help to overcome the problem. Preferred reporters for this purpose are radioactive isotopes or fluorescent groups, such as are used in conventional cycle sequencing reactions. Where the nucleoside analogues are base specific chain terminators they may be used in chain terminating sequencing protocols.
The final analysis step in DNA sequencing involves the use of a denaturing polyacrylamide electrophoresis gel to separate the DNA molecules by size. Electrophoretic separation based solely on size requires the complete elimination of secondary structure from the DNA. For most DNA this is typically accomplished by using high concentrations of urea in the polyacrylamide matrix and running the gels at elevated temperatures. However certain sequences, for example those capable of forming xe2x80x9cstem loopxe2x80x9d structures retain secondary structure and, as a result, display compression artefacts under standard electrophoresis conditions. Here, adjacent bands of the sequence run at nearly the same position on the gel, xe2x80x9ccompressedxe2x80x9d tightly together. Such loops are typically formed when a number of GC pairs are able to interact since GC pairs can form 3 hydrogen bonds compared to the 2 hydrogen bonds of AT pairs.
A second form of compression artefact is seen when rhodamine-labelled terminators are used and there is a G residue close to the terminus. In these cases, anomalous mobility of the DNA strand in a gel is often seen, possibly due to an interaction between the dye and the G residue.
Thus, compression artefacts appear to be caused whenever stable secondary structures exist in the DNA under the conditions prevailing in the gel matrix during electrophoresis. The folded structure runs faster through the gel matrix than an equivalent unfolded DNA.
Currently, gel compression artefacts are eliminated in one of two ways. One is to change to a stronger denaturing condition for the gel, for example 40% formamide with 7 M urea. The other method is to incorporate a derivative of dGTP during the synthesis of DNA.
Two nucleotides are currently used to remove compression artefacts. The first, 7-deazadGTP, can remove specific artefacts seen in fluorescent sequencing where the rhodamine dye-labelled terminator appears to interact with a nearby G residue. It can also reduce Hoogsteen interactions which may contribute to some compression artefacts. However, it does not remove all sequencing artefacts as it still has the same Watson and Crick (and wobble) H bonding capabilities as dGTP. The second nucleotide dITP will remove all sequencing artefacts. It has reduced hydrogen bonding capabilities, so preventing secondary structure being a problem. The downside of this analogue is that it is a very poor DNA polymerase substrate. It requires lower temperature and longer extension times than dGTP in cycle sequencing reactions. This analogue produces sequences with large variations in peak heights (fluorescent sequencing) and band intensities (radioactive sequencing). In fact it is only really suited to use with [xcex133P] ddNTP and ThermoSequenase(trademark) sequencing protocols due to the exceptionally high quality of the banding pattern. Therefore there is a need for a dGTP analogue that is a good DNA polymerase substrate which has the combined characteristics of 7 deaza dGTP and dITP. This may be achieved by the use of analogues described herein.
The nucleoside analogues of this invention can also be used in any of the existing applications which use native nucleic acid probes labelled with haptens, fluorophores or other reporter groups, for example on Southern blots, dot blots and in polyacrylamide or agarose gel based methods or solution hybridization assays and other assays in microtitre plates or tubes or arrays of oligonucleotides or nucleic acids such as on microchips. The probes may be detected with antibodies targeted either against haptens which are attached to the base analogues or against the base analogues themselves which would be advantageous in avoiding additional chemical modification. Antibodies used in this way are normally labelled with a detectable group such as a fluorophore or an enzyme. Fluorescent detection may also be used if the base analogue itself is fluorescent or if there is a fluorophore attached to the nucleoside analogue.
The nucleoside analogues of the present invention with the combination of molecular diversity and increased numbers of positions where reporter groups may be added can result in a series of improved enzyme substrates.
Another preferred aspect of the invention is to incorporate the nucleoside analogue triphosphate into DNA by means of a polymerase but without a reporter label for the purpose of random mutagenesis. It has been shown by Zaccolo et al, 1996, J. Mol. Biol. 255, 589-603 that when nucleotide analogues with ambivalent base pairing potential are incorporated by the PCR into DNA products, they induce the formation of random mutations within the DNA products. In the above publication, the nucleotide analogue dPTP was shown to be incorporated into DNA by Taq polymerase in place of TTP and, with lower efficiency, dCTP. After 30 cycles of DNA amplification, the four transition mutations Axe2x86x92G, Txe2x86x92C, Gxe2x86x92A and Cxe2x86x92T were produced. The compound 8-oxodGTP was also used to cause the formation of the transversion mutations Axe2x86x92C and Txe2x86x92G. The nucleoside analogue triphosphates with ambivalent base pairing potential described within this invention may be used for a similar purpose.
RNA is an extremely versatile biological molecule. Experimental studies by several laboratories have shown that in vitro selection techniques can be employed to isolate short RNA molecules from RNA libraries that bind with high affinity and specificity to proteins, not normally associated with RNA binding, including a few antibodies, (Gold, Allen, Binkley, et al,1993, 497-510 in The RNA World, Cold Spring Harbor Press, Cold Spring Harbor N.Y., Gold, Polisky, Unlenbeck, and Yarus, 1995, Annu. Rev. Biochem. 64: 763-795, Tuerk and Gold, 1990, Science 249:505-510, Joyce, 1989, Gene 82:83-87, Szostak, 1992, Trends Biochem. Sci 17:89-93, Tsai, Kenan and Keene, 1992, PNAS 89:8864-8868, Tsai, Kenan and Keene, 1992, PNAS 89:8864-8868, Doudna, Cech and Sullenger, 1995, PNAS 92:2355-2359). Some of these RNA molecules have been proposed as drug candidates for the treatment of diseases like myasthenia gravis and several other auto-immune diseases.
The basic principle involves adding an RNA library to the protein or molecule of interest; washing to remove unbound RNA; then specifically eluting the RNA bound to the protein. This eluted RNA is then reverse transcribed and amplified by PCR. The DNA is then transcribed using modified nucleotides (either 2xe2x80x2 modifications to give nuclease resistance e.g. 2xe2x80x2 F, 2xe2x80x2 NH2, 2xe2x80x2 OCH3 and/or C5 modified pyrimidines and/or C8 modified pyrimidines). Those molecules that are found to bind the protein or other molecule of interest are cloned and sequenced to look for common (xe2x80x9cconsensusxe2x80x9d) sequences. This sequence is optimised to produce a short oligonucleotide which shows improved specific binding which may then be used as a therapeutic, or member of a binding pair.
The base analogues described here, when converted to the ribonucleoside triphosphate or ribonucleoside phosphoramidite, or to the deoxyribonucleoside triphosphate or deoxyribonucleoside phosphoramidite, will significantly increase the molecular diversity available for this selection process. This may lead to oligonucleotides with increased binding affinity to the target that is not available using the current building blocks.
The secondary structure of nucleic acids is also important when considering ribozyme function. The base analogues of the present invention may cause the formation of secondary structures which would otherwise be unavailable using native bases or other modified nucleotide derivatives.
The hybridization binding properties of nucleic acids incorporating base analogues of the present invention may have particular application in the antisense or antigene field.
The base analogues of the present invention may have properties which are different to those of the native bases and therefore are particularly suited to other important applications. In particular, the interaction of these base analogues with enzymes may be extremely important in vivo and may result in the development of new anti-viral therapeutics.