The present invention relates to a method of sequencing a nucleic acid. In particular, the invention relates to a method for determining a target nucleic acid sequence where the target nucleic acid sequence is comprised in a preparation comprising a non-target nucleic acid sequence. The invention also relates to a method for determining the haplotype of a subject.
The sequencing of nucleic acids, in particular DNA, is of fundamental importance in many areas of biological research, clinical diagnosis and treatment. Sequencing of DNA is typically carried out by a method based on the Sanger dideoxy chain-termination method (Sanger, F., Nicklen, S., and Coulson, A. R. (1977) “DNA Sequencing with chain-terminating inhibitors” PNAS USA 74:5463-5467). In this method, a labelled oligonucleotide primer complementary to a known sequence adjacent to the target sequence is used to initiate DNA polymerase-catalysed elongation into the target sequence. Typically, four polymerase reactions are carried out for each round of sequencing. Each reaction contains all four deoxynucleotides (dNTPs-dCTP, dTTP, dGTP and dATP) plus a small amount of one dideoxynucleotide (ddNTP-ddCTP, ddTTP, ddGTP or ddATP). Because ddNTPs have no 3′ hydroxyl group, elongation of the nascent strand is occasionally terminated by incorporation of a ddNTP. Thus the sequencing reaction produces a series of labelled strands whose lengths are indicative of the location of a particular base in the sequence. The resultant labelled strands are typically separated according to size by polyacrylamide gel electrophoresis and visualised by detecting the label, for example by autoradiography where the primer was radiolabelled. More recently, the Sanger sequencing method has been adapted in various ways, in particular for large-scale automated sequencing using multiple fluorescent labels and capillary gel electrophoresis.
One problem with sequencing methods based on the Sanger method occurs when the target nucleic acid to be sequenced is provided in a preparation comprising one or more different nucleic acids or sequences which show some sequence identity to the target sequence. In particular, if a primer-binding sequence is found in both the target sequence and a second or further sequences, the sequencing reaction will lead to products which are derived from primer binding to the second or further sequences, as well as the target sequence. Where the target sequence diverges from the second or further sequences, the resultant gel or chromatograph will reveal two or more bases as being present at a particular location. Because the method does not allow discrimination between the products of the target sequence and the second or further sequences, the target sequence cannot be determined unambiguously.
This problem is particularly significant when it is desired to determine the sequence of one allele of a heterozygote pair at a polymorphic location in a single individual. Many eukaryotic cells are diploid, having two copies of most chromosomes, and sequence differences usually exist between each copy of a particular chromosome. Because DNA prepared from one individual will normally contain copies of both chromosomes, standard sequencing methods are unable to differentiate between sequences derived from each copy. Where there is a single nucleotide difference between each allele, the DNA sequence of each chromosome will nevertheless be clear (although it would not be possible to ascribe each sequence to a particular paternal or maternal chromosome). Where the polymorphism extends for two or more nucleotides, or where there are two or more polymorphic sites (alleles) separated by regions of common sequence, it is not possible to discern the sequence of the two alleles. In particular, standard sequencing methods are not able to determine the combination of alleles existing on a particular chromosome (the haplotype).
In the wave of interest spawned by the mapping of the human genome, interest has grown in the use of single nucleotide polymorphisms (SNPs) to identify target genes associated with disease or drug response. In some instances, the presence of a particular SNP alone may be sufficient to cause a particular disease or to explain the individual variability in sensitivity to drugs.
However, it is not clear how often knowledge of an individual SNP will have utility in the clinic or in drug development. Research has shown that in asthma, at least, the association of individual SNPs to form a complete haplotype may be more relevant in predicting drug response than knowledge of isolated individual SNPs. In many cases it may be necessary to obtain a haplotype sequence involving the characterisation of two or more SNPs on each chromosome. It is therefore highly desirable to determine the combination of SNPs that co-exist on a single chromosome.
HLA (human leukocyte antigen or human leukocyte associated antigen A) genotyping is one area where haplotyping is important. Determination of the two haplotype sequences of the HLA genes is crucial to the success of organ transplantation. The individual haplotypes of the donor must be matched with the recipient before transplantation to avoid rejection of the transplant. Methods for evaluating HLA allele types have been described in the past. One such method relies on performing family studies, which is very time-consuming. An alternative method based on DNA sequencing is disclosed in WO 97/23650. However, where heterozygous alleles exist, this method relies on prior knowledge of existing haplotype sequences, so that ambiguous bases can be ascribed to one allele or another.
Many of the methods used for haplotyping used in the past rely on preparing a composition comprising only a single haplotype sequence before sequencing. One way of doing this is by converting a diploid cell into a haploid cell. This requires a high investment, is labour intensive and slow but gives complete haplotype separation. Alternatively, human chromosomes can be cloned into yeast in order to get a haploid for that particular chromosome. This suffers from the same drawbacks in terms of time and cost.
One way of obtaining a preparation comprising only a single haplotype sequence is to amplify DNA by PCR using allele-specific primers. This type of approach for sequencing both alleles of a deletion polymorphism in intron 6 of the human dopamine 2 receptor gene (DRD2) is described in DNA Sequence Vol 6 (2), pp 87-94 (1996), Finck et al. In this method, allele-specific primers are used to amplify individual allele sequences by polymerase chain reaction (PCR). The primers are designed so that they produce amplicons of differing lengths, so that the products of each allele can be discriminated by agarose gel electrophoresis when both alleles are simultaneously amplified in the same reaction tube. The amplicons from each allele are then extracted from the gel and sequenced using conserved primers. The disadvantage of this approach is that it requires the prior knowledge of at least two, sufficiently separated regions of dissimilarity between the alleles so that appropriate allele-specific primers producing different-sized products can be designed. In addition, it requires a time-consuming gel separation and extraction step prior to sequencing.
A related approach is described in Biotechniques Vol 10 (1), pp 30, 32 and 34 (1991), Kaneoka et al. Biotinylated allele-specific oligonucleotide primers coupled to streptavidin-coated magnetic beads are used to amplify DNA from one haplotype by PCR, and then conserved primer is used for solid-phase direct DNA sequencing.
WO 92/15711 discloses a method for determining a major histocompatibility complex genotype of a subject in a sample containing nucleic acid. The method involves PCR amplification of the gene locus of interest, with all alleles for the gene locus to be sequenced being amplified with one conserved oligonucleotide primer pair and at least one allele for the gene locus being amplified with one conserved oligonucleotide primer and one non-conserved oligonucleotide primer. The amplicons for each allele are then sequenced with a conserved primer.
A different method for determining haplotype sequences involves analysis of PCR amplified sequences covering a polymorphic region by hybridisation rather than sequencing. PCR amplicons are contacted with oligonucleotide probes complementary to the sequence of either the maternal or paternal chromosome in a region comprising an SNP. Probes complementary to the maternal or paternal chromosomes are immobilised in different areas of a solid phase. A second set of oligonucleotide probes, labelled in a different way and complementary to the sequence of either the maternal or paternal chromosome in a region comprising a second SNP, is then used to identify which sequence at the first SNP is on the same chromosome as a particular sequence at the second SNP.
Other approaches have been adopted in the past for determining a target nucleic acid sequence when the target sequence is contained in a preparation comprising a non-target nucleic acid sequence. In one method described in WO 97/46711, a primer is selected that complements one strand but not the other, and an artificial mismatch is introduced into the primer. By selecting suitable hybridisation conditions so that stable duplexes form between the primer and one allele but not between the primer and the other allele, chain-extension sequencing of a single allele is achieved. A disadvantage of this method is that the selection of appropriate hybridisation conditions is time-consuming and not necessarily straightforward.
WO 00/20628 describes a method by which multiple genomic loci can be sequenced in the same reaction mixture. This method allows the sequencing of a second locus in the mixture by using primers which are longer than the longest product formed from the sequencing reaction in relation to a first locus. Different primers are used for each locus. However, this document does not disclose a method for haplotyping for particular alleles of a single locus.