Throughout this application, various publications are referenced in parentheses by author and year. Full citations for these references may be found at the end of the specification immediately preceding the claims. The disclosures of these publications in their entireties are hereby incorporated by reference into this application to more fully describe the state of the art to which this invention pertains.
The ability to sequence deoxyribonucleic acid (DNA) accurately and rapidly is revolutionizing biology and medicine. The confluence of the massive Human Genome Project is driving an exponential growth in the development of high throughput genetic analysis Technologies. This rapid technological development involving chemistry, engineering, biology, and computer science makes it possible to move from studying single genes at a time to analyzing and comparing entire genomes.
With the completion of the first entire human genome sequence map, many areas in the genome that are highly polymorphic in both exons and introns will be known. The pharmacogenomics challenge is to comprehensively identify the genes and functional polymorphisms associated with the variability in drug response (Roses, 2000). Resequencing of polymorphic areas in the genome that are linked to disease development will contribute greatly to the understanding of diseases, such as cancer, and therapeutic development. Thus, high-throughput accurate methods for resequencing the highly variable intron/exon regions of the genome are needed in order to explore the full potential of the complete human genome sequence map. The current state-of-the-art Technology for high throughput DNA sequencing, such as used for the Human Genome Project (Pennisi 2000), is capillary array DNA sequencers using laser induced fluorescence detection (Smith et al., 1986; Ju et al. 1995, 1996; Kheterpal et al. 1996; Salas-Solano et al. 1998). Improvements in the polymerase that lead to uniform termination efficiency and the introduction of thermostable polymerases have also significantly improved the quality of sequencing data (Tabor and Richardson, 1987, 1995). Although capillary array DNA sequencing technology to some extent addresses the throughput and read length requirements of large scale DNA sequencing projects, the throughput and accuracy required for mutation studies needs to be improved for a wide variety of applications ranging from disease gene discovery to forensic identification. For example, electrophoresis based DNA sequencing methods have difficulty detecting heterozygotes unambiguously and are not 100% accurate in regions rich in nucleotides comprising guanine or cytosine due to compressions (Bowling et al. 1991; Yamakawa et al. 1997). In addition, the first few bases after the priming site are often masked by the high fluorescence signal from excess dye-labeled primers or dye-labeled terminators, and are therefore difficult to identify. Therefore, the requirement of electrophoresis for DNA sequencing is still the bottleneck for high-throughput DNA sequencing and mutation detection projects.
The concept of sequencing DNA by synthesis without using electrophoresis was first revealed in 1988 (Hyman, 1988) and involves detecting the identity of each nucleotide as it is incorporated into the growing strand of DNA in a polymerase reaction. Such a scheme coupled with the chip format and laser-induced fluorescent detection has the potential to markedly increase the throughput of DNA sequencing projects. Consequently, several groups have investigated such a system with an aim to construct an ultra high-throughput DNA sequencing procedure (Cheeseman 1994, Metzker et al. 1994). Thus far, no complete success of using such a system to unambiguously sequence DNA has been reported. The pyrosequencing approach that employs four natural nucleotides (comprising a base of adenine (A), cytosine (C), guanine (G), or thymine (T)) and several other enzymes for sequencing DNA by synthesis is now widely used for mutation detection (Ronaghi 1998). In this approach, the detection is based on the pyrophosphate (PPi) released during the DNA polymerase reaction, the quantitative conversion of pyrophosphate to adenosine triphosphate (ATP) by sulfurylase, and the subsequent production of visible light by firefly luciferase. This procedure can only sequence up to 30 base pairs (bps) of nucleotide sequences, and each of the 4 nucleotides needs to be added separately and detected separately. Long stretches of the same bases cannot be identified unambiguously with the pyrosequencing method.
More recent work in the literature exploring DNA sequencing by a synthesis method is mostly focused on designing and synthesizing a photocleavable chemical moiety that is linked to a fluorescent dye to cap the 3xe2x80x2xe2x80x94OH group of deoxynucleoside triphosphates (dNTPs) (Welch et al. 1999). Limited success for the incorporation of the 3xe2x80x2-modified nucleotide by DNA polymerase is reported. The reason is that the 3xe2x80x2-position on the deoxyribose is very close to the amino acid residues in the active site of the polymerase, and the polymerase is therefore sensitive to modification in this area of the deoxyribose ring. On the other hand, it is known that modified DNA polymerases (Thermo sequenase and Taq FS polymerase) are able to recognize nucleotides with extensive modifications with bulky groups such as energy transfer dyes at the 5-position of the pyrimidines (T and C) and at the 7-position of purines (G and A) (Rosenblum et al. 1997, Zhu et al. 1994). The ternary complexes of rat DNA polymerase, a DNA template-primer, and dideoxycytidine triphosphate (ddCTP) have been determined (Pelletier et al. 1994) which supports this fact. As shown in FIG. 1, the 3-D structure indicates that the surrounding area of the 3xe2x80x2-postion of the deoxyribose ring in ddCTP is very crowded, while there is ample space for modification on the 5-position the cytidine base.
The approach disclosed in the present application is to make nucleotide analogues by linking a unique label such as a fluorescent dye or a mass tag through a cleavable linker to the nucleotide base or an analogue of the nucleotide base, such as to the 5-position of the pyrimidines (T and C) and to the 7-position of the purines (G and A), to use a small cleavable chemical moiety to cap the 3xe2x80x2xe2x80x94OH group of the deoxyribose to make it nonreactive, and to incorporate the nucleotide analogues into the growing DNA strand as terminators. Detection of the unique label will yield the sequence identity of the nucleotide. Upon removing the label and the 3xe2x80x2xe2x80x94OH capping group, the polymerase reaction will proceed to incorporate the next nucleotide analogue and detect the next base.
It is also desirable to use a photocleavable group to cap the 3xe2x80x2xe2x80x94OH group. However, a photocleavable group is generally bulky and thus the DNA polymerase will have difficulty to incorporate the nucleotide analogues containing a photocleavable moiety capping the 3+xe2x80x94OH group. If small chemical moieties that can be easily cleaved chemically with high yield can be used to cap the 3xe2x80x2xe2x80x94OH group, such nucleotide analogues should also be recognized as substrates for DNA polymerase. It has been reported that 3xe2x80x2-O-methoxy-deoxynucleotides are good substrates for several polymerases (Axelrod et al. 1978). 3xe2x80x2-O-allyl-dATP was also shown to be incorporated by Ventr(exo-) DNA polymerase in the growing strand of DNA (Metzker et al. 1994). However, the procedure to chemically cleave the methoxy group is stringent and requires anhydrous conditions. Thus, it is not practical to use a methoxy group to cap the 3xe2x80x2xe2x80x94OH group for sequencing DNA by synthesis. An ester group was also explored to cap the 3xe2x80x2xe2x80x94OH group of the nucleotide, but it was shown to be cleaved by the nucleophiles in the active site in DNA polymerase (Canard et al. 1995). Chemical groups with electrophiles such as ketone groups are not suitable for protecting the 3xe2x80x2xe2x80x94OH of the nucleotide in enzymatic reactions due to the existence of strong nucleophiles in the polymerase. It is known that MOM (xe2x80x94CH2OCH3) and allyl (xe2x80x94CH2CHxe2x95x90CH2) groups can be used to cap an xe2x80x94OH group, and can be cleaved chemically with high yield (Ireland et al. 1986; Kamal et al. 1999). The approach disclosed in the present application is to incorporate nucleotide analogues, which are labeled with cleavable, unique labels such as fluorescent dyes or mass tags and where the 3xe2x80x2xe2x80x94OH is capped with a cleavable chemical moiety such as either a MOM group (xe2x80x94CH2OCH3) or an allyl group (xe2x80x94CH2CHxe2x95x90CH2), into the growing strand DNA as terminators. The optimized nucleotide set (3xe2x80x2-RO-A-LABEL1, 3xe2x80x2-RO-C-LABEL2, 3xe2x80x2-RO-G-LABEL3, 3xe2x80x2-RO-T-LABEL4, where R denotes the chemical group used to cap the 3xe2x80x2xe2x80x94OH) can then be used for DNA sequencing by the synthesis approach.
There are many advantages of using mass spectrometry (MS) to detect small and stable molecules. For example, the mass resolution can be as good as one dalton. Thus, compared to gel electrophoresis sequencing systems and the laser induced fluorescence detection approach which have overlapping fluorescence emission spectra, leading to heterozygote detection difficulty, the MS approach disclosed in this application produces very high resolution of sequencing data by detecting the cleaved small mass tags instead of the long DNA fragment. This method also produces extremely fast separation in the time scale of microseconds. The high resolution allows accurate digital mutation and heterozygote detection. Another advantage of sequencing with mass spectrometry by detecting the small mass tags is that the compressions associated with gel based systems are completely eliminated.
In order to maintain a continuous hybridized primer extension product with the template DNA, a primer that contains a stable loop to form an entity capable of self-priming in a polymerase reaction can be ligated to the 3xe2x80x2 end of each single stranded DNA template that is immobilized on a solid surface such as a chip. This approach will solve the problem of washing off the growing extension products in each cycle.
Saxon and Bertozzi (2000) developed an elegant and highly specific coupling chemistry linking a specific group that contains a phosphine moiety to an azido group on the surface of a biological cell. In the present application, this coupling chemistry is adopted to create a solid surface which is coated with a covalently linked phosphine moiety, and to generate polymerase chain reaction (PCR) products that contain an azido group at the 5xe2x80x2 end for specific coupling of the DNA template with the solid surface. One example of a solid surface is glass channels which have an inner wall with an uneven or porous surface to increase the surface area. Another example is a chip.
The present application discloses a novel and advantageous system for DNA sequencing by the synthesis approach which employs a stable DNA template, which is able to self prime for the polymerase reaction, covalently linked to a solid surface such as a chip, and 4 unique nucleotides analogues (3xe2x80x2-RO-A-LABEL1, 3xe2x80x2-RO-C-LABEL2, 3xe2x80x2-RO-G-LABEL3, 3xe2x80x2-RO-T-LABEL4). The success of this novel system will allow the development of an ultra high-throughput and high fidelity DNA sequencing system for polymorphism, pharmacogenetics applications and for whole genome sequencing. This fast and accurate DNA resequencing system is needed in such fields as detection of single nucleotide polymorphisms (SNPS) (Chee et al. 1996), serial analysis of gene expression (SAGE) (Velculescu et al. 1995), identification in forensics, and genetic disease association studies.
This invention is directed to a method for sequencing a nucleic acid by detecting the identity of a nucleotide analogue after the nucleotide analogue is incorporated into a growing strand of DNA in a polymerase reaction, which comprises the following steps:
(i) attaching a 5xe2x80x2 end of the nucleic acid to a solid surface;
(ii) attaching a primer to the nucleic acid attached to the solid surface;
(iii) adding a polymerase and one or more different nucleotide analogues to the nucleic acid to thereby incorporate a nucleotide analogue into the growing strand of DNA, wherein the incorporated nucleotide analogue terminates the polymerase reaction and wherein each different nucleotide analogue comprises (a) a base selected from the group consisting of adenine, guanine, cytosine, thymine, and uracil, and their analogues; (b) a unique label attached through a cleavable linker to the base or to an analogue of the base; (c) a deoxyribose; and (d) a cleavable chemical group to cap an xe2x80x94OH group at a 3xe2x80x2-position of the deoxyribose;
(iv) washing the solid surface to remove unincorporated nucleotide analogues;
(v) detecting the unique label attached to the nucleotide analogue that has been incorporated into the growing strand of DNA, so as to thereby identify the incorporated nucleotide analogue;
(vi) adding one or more chemical compounds to permanently cap any unreacted xe2x80x94OH group on the primer attached to the nucleic acid or on a primer extension strand formed by adding one or more nucleotides or nucleotide analogues to the primer;
(vii) cleaving the cleavable linker between the nucleotide analogue that was incorporated into the growing strand of DNA and the unique label;
(viii) cleaving the cleavable chemical group capping the xe2x80x94OH group at the 3xe2x80x2-position of the deoxyribose to uncap the xe2x80x94OH group, and washing the solid surface to remove cleaved compounds; and
(ix) repeating steps (iii) through (viii) so as to detect the identity of a newly incorporated nucleotide analogue into the growing strand of DNA;
wherein if the unique label is a dye, the order of steps (v) through (vii) is: (v), (vi), and (vii); and
wherein if the unique label is a mass tag, the order of steps (v) through (vii) is: (vi), (vii), and (v).
The invention provides a method of attaching a nucleic acid to a solid surface which comprises:
(i) coating the solid surface with a phosphine moiety,
(ii) attaching an azido group to a 5xe2x80x2 end of the nucleic acid, and
(iii) immobilizing the 5xe2x80x2 end of the nucleic acid to the solid surface through interaction between the phosphine moiety on the solid surface and the azido group on the 5xe2x80x2 end of the nucleic acid.
The invention provides a nucleotide analogue which comprises:
(a) a base selected from the group consisting of adenine or an analogue of adenine, cytosine or an analogue of cytosine, guanine or an analogue of guanine, thymine or an analogue of thymine, and uracil or an analogue of uracil;
(b) a unique label attached through a cleavable linker to the base or to an analogue of the base;
(c) a deoxyribose; and
(d) a cleavable chemical group to cap an xe2x80x94OH group at a 3xe2x80x2-position of the deoxyribose.
The invention provides a parallel mass spectrometry system, which comprises a plurality of atmospheric pressure chemical ionization mass spectrometers for parallel analysis of a plurality of samples comprising mass tags.