The rapid sequencing of nucleic acids (NA) in a biological sample in order to characterize single nucleotide polymorphisms (SNP), complex mutations or for de-novo sequencing is of growing interest in the art. Such sequencing can be performed directly with biological samples containing sufficient amounts of the target nucleic acids or after the amplification of the NA within the biological sample.
Sequencing of nucleic acids is mainly performed using the Sanger method and analysis with capillary electrophoresis (Smith A J H Methods Enzymol. 65 (1980) 560-580). The Sanger sequencing method is based on a controlled termination of the enzymatic replication process and the subsequent analysis of the chain termination products. The chain termination products are produced with 4 different amplification reactions, wherein for each amplification reaction one of the normal nucleotides is partially replaced (1-4%) by the corresponding didesoxynucleotide (ddNTP) labeled with a fluorescence dye in order to terminate the replication process after the random incorporation of said ddNTP. These 4 different amplification reactions may be performed simultaneously in one preparation using different fluorescence dyes for each of the 4 terminating nucleic acid bases or separately with 4 individual preparations, whereas one fluorescence dye is sufficient. Although the classical Sanger sequencing of NAs using electrophoretic separation of chain termination products is well established, this method is time consuming, non-multiplexable and requires labeled ddNTPs together with expensive enzymes. On the other hand, the Sanger method can be used for de-novo sequencing.
An alternative to the classical Sanger sequencing with electrophoresis is the sequencing via mass spectrometry (MS), a technique that does not suffer from the problems mentioned above. In the literature, one can find review articles summarizing the genotyping of SNPs by mass spectrometry (Tost et al Mass Spec Review 21(6) (2002) 388-418) or the use of mass spectrometry in genomics (Meng et al Biomol. Eng. 21(1) (2004) 1-13). The sequencing on the basis of mass spectrometry is known mainly with three different methods that are used predominantly: a) Ladder Sequencing (Exonuclease digest followed by determination of the molecular weight (MW) of the products (Smirnov I P et al. Anal. Biochem. 238 (1996) 19-25)), b) Sanger Sequencing followed by determining the MWs of the chain termination products (Kirpekar F et al. Nucl. Acids Research 26 (11) (1998) 2554-2559) and c) Sequencing by collision induced dissociation, the so called CID-fragmentation (WO 03/025219 A2; Oberacher et al J. Am. Soc. Mass Spec 15(1) (2004) 32-42).
The mass spectrometric analysis involving CID-fragmentation is also called tandem mass spectrometry or MS/MS technique (or more general MSn). Tandem mass spectrometry comprises isolation of a parent molecular ion followed by fragment formation in the gas phase via collision or resonance activation and determining the molecular weights of the fragments. Application of tandem mass spectrometry for peptide sequence analysis is well known in the literature (U.S. Pat. No. 6,017,693).
In case of nucleic acids, the comparison of theoretical fragments from a given reference sequence with the experimental fragment mass spectra allows for identification of the NA. The reference sequence is systematically altered through permutation until a best fit between experimental and theoretical data is obtained. (WO 03/025219 A2, Oberacher et al. Nucleic Acids Research 30(14) (2002) e67). Using this approach, it is possible to reliably verify sequences (re-sequencing) or to specifically detect oligonucleotides in a biological sample.
However, the major problem of sequencing with MS lies in the fact that using the described methods only rather short sequence lengths can be covered. Using the MALDI (matrix-assisted laser desorption)—Sanger method a maximum of 30 bp can be sequenced. This is also true for the method according to U.S. Pat. No. 6,017,693, where problems are eminent already at NA lengths of 10 bp. The sequence verification according to WO 03/025219 A2 shows problems for oligonucleotides longer than 40 to 60 bp. Additionally, the comparison of theoretical data for all possible sequences with the experimental data for de-novo sequencing becomes time consuming with increasing nucleic acid length.
If longer target nucleic acids have to be analyzed, several different approaches are known in the art that offer the opportunity of fragmenting nucleic acids in a controlled fashion.
Controlled fragmentation of nucleic acids may be realized using base specific reagents, like e.g. digestion or restriction enzymes. In case of ribonucleic acids, several RNAses are known that are able to cleave the target molecules, e.g. the G-specific RNAse T1 or A-specific RNAse U1. Dicer enzymes (RNAseIII family) cut RNA into well defined pieces of about 20 bases. In case of DNA, it is possible to use e.g. the uracil-DNA-glycosylase (UDG) or restriction endonucleases that recognize a specific base sequence and cut within or nearby this region. Nick-endonucleases can be used to cut only one strand of a dsDNA double helix.
As an alternative, Gelfand et al (U.S. Pat. No. 5,939,292) introduced a thermostable polymerase having reduced discrimination against ribonucleotides (NTPs or ribo-NTPs or ribo-bases). After an amplification step with said thermostable polymerase, the amplification product comprises a mixture of incorporated deoxyribonucleotides (dNTP) and NTPs providing the opportunity to use a simple alkaline hydrolysis step for the controlled fragmentation at the ribo-base positions. The resulting fragmentation products may be analyzed afterwards using electrophoresis in order to gain information of the nucleic acid sequence.
A fragmentation-based mass spectrometric method for the analysis of sequence variations is disclosed in WO 2004/050839. The WO 2004/097369 of the same applicant discloses a mass spectrometric method for the analysis and sequencing of biomolecules by fragmentation. The U.S. Pat. No. 6,468,748 B1 of Genetrace Systems Inc. describes a method for the analysis of biomolecules comprising mass spectrometry and a fragmentation step. U.S. Pat. No. 6,777,188 B1 discloses a method for genotyping a diploid organism comprising a comparison of masses and a cleaving step at modified nucleotides. Methexis Inc. describes a sequence analysis based on mass spectrometry, a cleavage reaction and the comparison with reference nucleic acids (WO 00/66771).