The invention relates to 5xe2x80x2-modified nucleotides and to nucleic acids which contain these nucleotides. Processes for incorporating the 5xe2x80x2-modified nucleotides into nucleic acids, and the subsequent site-specific cleavage of the nucleic acids at the 5xe2x80x2-modified monomer building blocks, are also disclosed. These processes can be employed for nucleic acid sequencing, for generating nucleic acid libraries, for detecting mutations, for preparing support-bound nucleic acids and for pharmaceutical purposes.
The processes which are nowadays routinely used for sequencing nucleic acids generally comprise polymerizing a nucleic acid strand which is complementary to a template and generating a mixture of nucleic acid fragments of all possible lengths (1). This nucleic acid fragment mixture can be obtained by terminating the polymerization or degrading using exonucleases (2), by iterative sequencing methods (3), by adding individual bases and detecting the release of pyrophosphate (4), by chemical methods using elimination reactions (5), by chemicoenzymic methods, involving incorporating modified nucleosides and cleaving by attack on phosphorothioate- or boron-modified nucleotides (6), by incorporating ribonucleosides into DNA and subsequently cleaving under basic conditions (7) or by incorporating 3xe2x80x2-dye-labelled nucleotides while at the same time or subsequently eliminating the dye (8). In addition to these methods, strategies are also available which involve sequencing by hybridizing (9) and a physical production of fragments by means of mass spectrometry (10). The possibility of detecting by means of atomic force microscopy (11) has also been discussed.
However, in the last twenty years, the method of choice has been the enzymic chain termination method. This method makes possible automation and sequencing with high throughput for use when sequencing entire genomes. The automation was achieved by using dye primers (12), internal labelling (13) or dye terminators (14). Sequencing with dye primers and internal labelling suffer, however, from the disadvantage that irregular termination events occur in the sequence ladder and can lead to erroneous interpretation of the sequence data. Dye terminators suffer from the disadvantage that they are sometimes incorporated at incorrect sites and only permit a limited length to be read since they are modified substrates.
In this addition to this, there is a need to reduce the quantity of DNA which is required for a sequence determination. There is currently only one single cyclic sequencing method available for this purpose (15), which method, however, in contrast to PCR, in which an exponential amplification takes place, only leads to linear amplification of the products. The direct sequencing of PCR products in turn displays disadvantages since relatively large quantities of triphosphates and primer molecules are present in the reaction vessels and can lead to impairment of the sequencing reaction or the sequence determination (16). However, the purification of the PCR products is time-consuming and represents an additional procedural step. While triphosphates can be cleaved using enzymic methods (17), this is also time-consuming and increases the costs of carrying out the sequencing reaction. As an alternative, a direct exponential amplification and sequencing method (DEXAS) is available for sequencing small quantities of DNA material (18); however, it has so far not been possible to use this method for a standard sequencing and, in contrast to its name, the method is not directly exponential.
The present invention makes available a novel process for sequencing nucleic acid, which process at least partially avoids the disadvantages of the state of the art. In particular, this process avoids the problem of substrate specificity with regard to dye terminators and makes possible rapid DNA sequencing using very small quantities of DNA starting material in combination with a nucleic acid amplification reaction such as PCR. The process also improves the readable length of the sequenceable templates.
The process according to the invention is based on using compounds of the general formula (I): 
in which:
B denotes a nucleobase, i.e. a natural or unnatural base which is suitable for hybridizing to complementary nucleic acid strands, such as A, C, G, T, U, I, 7-deaza-G, 7-deaza-A, 5-methyl-C, etc.,
W and Z in each case denote OR1, SR1, N(R1)2 or R1, where R1, in each case independently, on each occurrence represents hydrogen or an organic radical, e.g. an alkyl, alkenyl, hydroxyalkyl, amine, ester, acetal or thioester radical, preferably having up to 10 carbon atoms and particularly preferably having up to 6 carbon atoms,
X denotes OR2, SR2 or B(R2)3, where R2, in each case independently, denotes hydrogen, a cation, e.g. an alkali metal ion or ammonium ion, or an organic radical, e.g. a dye such as fluorescein, rhodamine, cyanine and their derivatives,
Y denotes NR3 or S, in particular NR3, where R3 represents hydrogen or an organic radical, e.g. a saturated or unsaturated hydrocarbon radical, in particular a C1-C4 radical or a dye radical, with hydrogen also being understood to mean the isotopes deuterium and tritium, and
R denotes hydrogen, a cation, an organic radical or an optionally modified phosphate group or diphosphate group, in particular a diphosphate group,
for incorporation into nucleic acids and for the subsequent site-specific cleavage of the nucleic acids, preferably by hydrolysing the Pxe2x80x94Y bond, resulting in the formation of nucleic acid fragments having an HYxe2x80x94CH2-5xe2x80x2 end.
The group R can denote an organic radical, for example a lipophilic radical, which facilitates the penetration of the substance into a cell. R is preferably a phosphate group: 
or a diphosphate group: 
This phosphate or diphosphate group can be modified. Thus, one or more terminal oxygen atoms can carry substituents, e.g. organic radicals. On the other hand, one or more terminal oxygen atoms and, in the case of the diphosphate group, the bridging oxygen atom as well, can be replaced by groups such as S, NR3 or C(R3)2, with R3 being defined as before. In addition to this, 2 substituents on terminal oxygen atoms can also be bridged with each other.
When substituents are present, they are preferably located on oxygen atoms belonging to the phosphorus atom which is in each case terminal, particularly preferably on the xcex3-phosphorus atom. Examples of suitable substituents are organic radicals such as alkyl radicals, which can themselves be substituted, or a salicyl group, which can form a 6-membered cyclic diester with 2 oxygen atoms belonging to the terminal phosphorus. The aromatic nucleus of the salicyl groups can again itself carry one or more additional substituents, e.g. those defined as for R1 or halogen atoms. Additionally preferred substituents on the oxygen atom are radicals such as C1-C10-alkyl, xe2x80x94(CH2)nxe2x80x94N3, (CH2)nN(R3)2 or xe2x80x94(CH2)nNHOCO(CH2)mxe2x80x94N(R3)2, where n and m are integers from 1 to 8, preferably from 2 to 5, and R3 is defined as above, but can, in addition, preferably denote an aromatic radical such as phenyl or dinitrophenyl.
The incorporation of compounds of the general formula (I) into nucleic acids preferably takes place enzymically. However, a chemical synthesis is also possible. For an enzymic incorporation, preference is given to using enzymes which are selected from the group consisting of DNA-dependent DNA polymerases, DNA-dependent RNA polymerases, RNA-dependent DNA polymerases, RNA-dependent RNA polymerases and terminal transferases. Particular preference is given to T7 DNA polymerase and related enzymes, such as T3 DNA polymerase or SP6 DNA polymerase, or modifications of these enzymes. Correspondingly, the nucleic acids into which the compounds of the formula (I) are incorporated can be DNAs and/or RNAs which can, where appropriate, carry one or more additional modified nucleotide building blocks.
Nucleic acids which contain, as monomeric building blocks, at least one compound of the general formula (I) can be cleaved site-specifically at the nucleotide building block which contains the Pxe2x80x94Y bond. This site-specific cleavage can be effected, for example, at the Pxe2x80x94Y bond itself by raising the temperature, e.g. to at least 37xc2x0 C., by instigating acid conditions, e.g. pHxe2x89xa65, by microwave treatment or by laser treatment, e.g. using an infrared laser, and/or on the 3xe2x80x2 side of the nucleotide containing the Pxe2x80x94Y bond by means of enzymic digestion, for example using exonucleases or endonucleases or phosphodiesterases, e.g. 3xe2x80x2xe2x86x9215xe2x80x2 snake venom phosphodiesterase.
The process according to the invention can also be carried out in combination with an amplification reaction, e.g. a PCR. This enables extremely small quantities of DNA starting material to be used for generating labelled complementary nucleic acid strands. The nucleic acid amplification is preferably carried out using thermostable enzymes in several cycles.
The compounds according to (I) can be incorporated into the nucleic acids in solution. Alternatively, however, the compounds can also be incorporated into support-bound nucleic acids. After the synthesis, the nucleic acids can then be released from the support, where appropriate by site-specific cleavage of the Pxe2x80x94Y bond, or by other methods.
The site-specific cleavage of the nucleic acids results in the production of nucleic acid fragments which preferably possess the group Yxe2x80x94CH2xe2x80x94 at their 5xe2x80x2 ends and/or a phosphate group at their 3xe2x80x2 ends. Previously, nucleic acids which had been modified in this way had to be produced in a complicated manner by means of chemical synthesis (19) or by means of enzymic reactions (20, 21). The process according to the invention is considerably faster and cheaper and enables the compounds to be handled more easily. The modified nucleic acids which are prepared in this way can be used for therapeutic purposes and/or for molecular biological investigations, e.g. investigations of mechanisms for the uptake and metabolism of nucleic acids in cells, since it is readily possible to couple a labelling group to the 5xe2x80x2-Y group. The 3xe2x80x2 phosphate group in turn constitutes a protecting group in relation to a ligation and/or an enzymic elongation using polymerases. If desired, labelling groups can be added to the phosphorylated 3xe2x80x2 end of the nucleic acid fragments, e.g. if a dephosphorylation is carried out and oligonucleotides, which are labelled by an enzymic reaction, for example using ligase or terminal transferase, or dideoxynucleoside triphosphates, which are labelled using a polymerase, are added to the resulting 3xe2x80x2-OH group, or if the 3xe2x80x2-phosphate group contains a reactive group, e.g. a sulphur atom.
Furthermore, as a consequence of the defined group at their 5xe2x80x2 ends, the nucleic acid fragments according to the invention can readily be immobilized on a support which contains a functional surface which reacts with the Y group. On the other hand, it is also possible for the nucleic acid fragments to bind adsorptively to a surface by way of the Y group. Suitable supports are those which possess surfaces which are composed, for example, of metal, glass, ceramic and/or plastic. Particular preference is given to supports which possess glass and/or silicon surfaces. The supports can furthermore be of any desired form, e.g. microparticles, such as magnetic microparticles, or semiconductor materials, such as biochips, e.g. DNA or RNA chips, which, where appropriate, can contain several defined surfaces, in the form of array arrangements, which are able to bind specifically to nucleic acids.
The nucleic acid fragments according to the invention can also be coupled to a support when they are in the form of a mixture of different fragments. This results in the production of supports on which nucleic acid fragments are arranged randomly. This has advantages if, for example, a subsequent amplification is carried out on the support surface using primers which encode a predetermined nucleic acid sequence, for example a gene.
If a heterogeneous nucleic acid mixture is produced when the nucleic acids are cleaved, this mixture can then be used for preparing a nucleic acid library, in particular a random library. Such libraries can also be produced by means of multiple, random incorporation of compounds of the formula (I) into a nucleic acid strand followed by site-specific cleavage. In addition, degenerate primers, which bind randomly to nucleic acid templates, can also be employed for generating random nucleic acid libraries.
The fragments in the nucleic acid library can be reassembled combinatorially either without or after further enzymic or chemical treatment (DNA shuffling). Since the 5xe2x80x2 end of each fragment is provided with a Y group (with the exception of the 5xe2x80x2 end of the first fragment), the other fragments can only be assembled such that the. original first fragment forms the first fragment once again. The complete combinatorial scope can be exploited after having subjected the library, or individual fragments from it, to further enzymic or chemical treatment.
After the site-specific cleavage, the nucleic acid fragments which have been produced by the process according to the invention can be subjected to a detection reaction. This detection reaction can be effected using any methods which are known for this purpose. Preference is given to carrying out a mass spectrometric analysis and/or an electrophoresis, e.g. a polyacrylamide gel electrophoresis.
The detection reaction can, for example, be employed for detecting mutations, e.g. point mutations in nucleic acids. Two protocols for analysing point mutations are described in detail below.
Another important application of the process according to the invention is that of nucleic acid sequencing. Such sequencing processes can be carried out in a number of different variants. For example, the process according to the invention is also suitable for carrying out a cyclic sequencing in combination with a nucleic acid amplification and/or a bidirectional sequence analysis on one nucleic acid strand. Preferred examples of sequencing processes are described in detail below.
The present invention also relates to a pharmaceutical composition which comprises, as the active component, a compound of the general formula (I), where appropriate in combination with pharmaceutically tolerated excipients, adjuvants and/or fillers. In addition to this, the invention also relates to pharmaceutical compositions which comprise, as the active component, a nucleic acid into which at least one compound of the general formula (I) has been incorporated, and also, where appropriate, pharmaceutically tolerated excipients, adjuvants and/or fillers. The pharmaceutical compositions are suitable for use as agents for gene therapy, as anti-viral agents and as anti-tumour agents, or for antisense applications.
Thus, nuclease-resistant 5xe2x80x2-amino compounds, or nucleic acids which contain these compounds, can be introduced into living cells and incorporated by cellular and/or viral enzymes, e.g. polymerases or reverse transcriptases, into nucleic acids in these cells. If the cellular polymerase is, for example, unable to read the modified genes, and does not even accept the modified triphosphates as substrates, the viral genetic information cannot then be amplified. Furthermore, the use of the 5xe2x80x2-modified 5xe2x80x2-nucleoside triphosphates results in the viral genes being disintegrated since the Pxe2x80x94Y bond, in particular the Pxe2x80x94N bond, which has been introduced is labile under physiological conditions.
The invention additionally relates to a process for preparing nucleic acid fragments, which process comprises the steps of:
(a) providing a nucleic acid which contains at least one compound of the general formula (I) as a monomeric building block, and
(b) subjecting the nucleic acid to site-specific cleavage.
Compounds of the formula (I) according to the invention can be used as constituents of reagent kits for detecting nucleic acids, e.g. as sequencing kits or as kits for mutation analysis, where appropriate together with additional detection components. Examples of these additional detection components are enzymes, in particular polymerases, such as DNA polymerases or reverse transcriptases, oligonucleotides which can be used as primers and which can, where appropriate, carry a label at their 5xe2x80x2 ends and/or on their side chains, deoxynucleoside triphosphates which can, where appropriate, carry a label, and dideoxynucleoside triphosphates (chain termination molecules) which can optionally carry a label, and also additional reagents, e.g. buffers, etc., and solid supports. The reagent kits according to the invention preferably comprise the constituents which are specified in the following figures.