This application describes methods for the genetic analysis of biologically, medically and economically significant traits in mammals and other organisms, including humans. Genetic analysis refers to the determination of the nucleotide sequence of a gene or genes of interest in a subject organism, including methods for analysis of one site of sequence variation (i.e. genotyping methods) and methods for analysis of a collection of sequence variations (haplotyping methods). Genetic analysis further includes methods for correlating sequence variation with disease risk, diagnosis, prognosis or therapeutic management.
The use of novel genotyping and haplotyping methods for genetic analysis of the apolipoprotein E (ApoE) gene are described. These methods entail use of novel ApoE DNA sequence polymorphisms and haplotypes. The ApoE alleles and genetic analysis methods of this application will allow more sensitive measurement of the contribution of ApoE genetic variation to medically important phenotypes such as risk of heart disease, risk of Alzheimer""s disease and response to various therapeutic interventions, including pharmacotherapy.
This application also describes new methods for genotyping a DNA sample based on analysis of the mass of cleaved DNA fragments using mass spectrometry. These genotyping methods are better suited to the present and future requirements of DNA testing than current genotyping methods as a result of improved accuracy, decreased set-up and reagent costs, reduced complexity and excellent compatibility with automation.
At present, DNA diagnostic testing is largely concerned with identification of rare polymorphisms related to Mendelian traits. These tests have been in use for well over a decade. In the future genetic testing will come into much wider clinical and research use, as a means of making predictive, diagnostic, prognostic and pharmacogenetic assessments. These new genetic tests will in many cases involve multigenic conditions, where the correlation of genotype and phenotype is significantly more complex than for Mendelian phenotypes. To produce genetic tests with the requisite accuracy will require new methods that can simultaneously track multiple DNA sequence variations at low cost and high speed, without compromising accuracy. Many tests will be evaluated in the clinical research setting but only a small fraction will become major diagnostic tests; the clinical research process will reveal that most polymorphisms lack significant functional effects. The genetic analysis methods described in this application are relatively inexpensive to set up and run, while providing extremely high accuracy, and, most important, enabling sophisticated genetic analysis. They are therefore optimally suited to the exigencies of genetic test development in coming years.
The association of specific genotypes with disease risk, prognosis, and diagnosis as well as selection of optimal therapy for disease are some of the benefits expected to ensue from the human genome project. At present, the most common type of genetic study design for testing the association of genotypes with medically important phenotypes is a case control study where allele frequencies are measured in one or more phenotypically defined groups of cases and compared to allele frequencies in controls. (Alternatively, phenotype frequencies in two or more genotypically defined groups are compared.) The majority of such published genetic association studies have focused on measuring the contribution of a single polymorphic site (usually a single nucleotide polymorphism, abbreviated SNP) to variation in a medically important phenotype or phenotypes. In these studies one polymorphism serves as a proxy for all variation in a gene (or even a cluster of adjacent genes).
The limitations of such single polymorphism association analysis are becoming increasingly apparent. Recent articles (e.g. Terwilliger, J. and K. M Weiss. Linkage disequilibrium mapping of complex disease: fantasy or reality? Current Opinion in Biotechnology 9: 578-594, 1998) have drawn attention to the low quality of most association studies using single polymorphic sites (evidenced by their low degree of reproducibility). Some of the reasons for the lack of reproducibility of many association studies are apparent. In particular, the extent of human DNA polymorphismxe2x80x94most genes contain 10 or more polymorphic sites, and many genes contain over 100 polymorphic sitesxe2x80x94is such that a single polymorphic site can only rarely serve as a reliable proxy for all variation in a gene (which typically covers at least several thousand nucleotides and can extend over 1,000,000 nucleotides). Even in cases where one polymorphic site is responsible for significant biological variation, there is no reliable method for identifying such a site. The haplotyping and genetic analysis methods described in this application provide a systematic way to identify such polymorphic sites.
Several recent studies have begun to outline the extent of human molecular genetic variation. For example, a comprehensive survey of genetic variation in the human lipoprotein lipase (LPL) gene (Nickerson, D. A., et al. Nature Genetics 19: 233-240, 1998; Clark, A. G., et al. American Journal of Human Genetics 63: 595xe2x88x9d612, 1998) compared 71 human subjects and found 88 varying sites in a 9.7 kb region. On average any two versions of the gene differed at 17 sites. This and other studies show that sequence variation may be present at approximately 1 in 100 nucleotides when 50 to 100 unrelated subjects are compared. The implications of the this data are that, in order to create genetic diagnostic tests of sufficient specificity and selectivity to justify widespread medical use, more sophisticated methods are needed for measuring human genetic variation.
Beyond tests that measure the status of a single polymorphic site, the next level of sophisication in genetic testing is to genotype two or more polymorphic sites and keep track of the genotypes at each of the polymorphic sites when calculating the association between genotypes and phenotypes (e.g. using multiple regression methods). However, this approach, while an improvement on the single polymorphism method in terms of considering possible interactions between polymorphisms, is limited in power as the number of polymorphic sites increases. The reason is that the number of genetic subgroups that must be compared increases exponentially as the number of polymorphic sites increases. In a medical study of fixed size this has the effect of dramatically increasing the number of groups that must be compared, while reducing the size of each subgroup to a small number. The consequence of these effects is an unacceptable loss of statistical power. Consider, for example, a clinical study of a gene that contains 10 variable sites. If each site is biallelic then there are 210=1024 possible combinations of polymorphic sites. If the study population is 500 subjects then it is likely that many genetically defined subgroups will contain only a small number of subjects. Thus, consideration of multiple polymorphisms (as can be determined from DNA sequence data, for example) does not get at the problem that the DNA sequence from a diploid subject does not sufficiently constrain the sequence of the subject""s two chromosomes to be very useful for statistical analysis. Only direct determination of the DNA sequence on each chromosome (a haplotype) can constrain the number of genetic variables in each subject to two (allele 1 and allele 2), while accounting for all, or preferably at least a substantial subset of, the polymorphisms.
A much more powerful measure of variation in a DNA segment, then, is a haplotypexe2x80x94that is, the set of polymorphisms that are found on a single chromosome. Because of the evolutionary history of human populations, only a small fraction of all possible haplotypes (given a set of polymorphic sites at a locus) actually occur at appreciable frequency. For example, in a gene with 10 polymorphic sites only a small fractionxe2x80x94perhaps in the range of 1%xe2x80x94of the 1,024 possible genotypes is likely to exist at a frequency greater than 5% in a human population. Further, as described below, haplotypes can be clustered in groups of related sequences to facilitate genetic analysis. Thus determination of haplotypes is a simplifying step in performing a genetic association study (compared to the analysis of multiple polymorphisms), particularly when applied to DNA segments characterized by many polymorphic sites. There is also a potent biological rationale for sorting genes by haplotype, rather than by genotype at one polymorphic site: polymorphic sites on the same chromosome may interact in a specific way to determine gene function. For example, consider two sites of polymorphism in a gene, both of which encode amino acid changes. The two polymorphic residues may lie in close proximity in three dimensional space (i.e. in the folded structure of the encoded protein). If one of the polymorphic amino acids encoded at each of the two sites has a bulky side chain and the other a small side chain then one can imagine a situation in which proteins that have either [bulky-small], [small-bulky] or [small-small] pairs of polymorphic residues are fully functional, but proteins with [bulky-bulky] residues at the two sites are impaired, on account of a disruptive shape change caused by the interaction of the two bulky side groups. Now consider a subject whose genotype is heterozygous bulky/small at both polymorphic sites. The possible haplotype pairs in such a subject are [bulky-small]/[small-bulky], or [small-small]/[bulky-bulky]. The functional implications of these two haplotype pairs are quite different: active/active or active/inactive, respectively. A genotype test would simply reveal that the subject is doubly heterozygous. Only a haplotype test would reveal the biologically consequential structure of the variation. The interaction of polymorphic sites need not involve amino acid changes, of course, but could also involve virtually any combination of polymorphic sites.
The genetic analysis of complex traits can be made still more powerful by use of schemes to cluster haplotypes into related groups based on parsimony, for example. Templeton and coworkers have demonstrated the power of cladograms for analysis of haplotype data. (Templeton, A. R., Boerwinkle, E. and C. F. Sing. A Cladistic Analysis of Phenotypic Associations With Haplotypes Inferred From Restriction Endonuclease Mapping. I. Basic Theory and an Analysis of Alcohol Dehydrogenase Activity in Drosophila Genetics 117: 343-351, 1987. Templeton, A. R., Crandall, K. A. and C. F. Sing. A Cladistic Analysis of Phenotypic Associations With Haplotypes Inferred From Restriction Endonuclease Mapping and DNA Sequence Data. III. Cladogram Estimation Genetics 132: 619-633, 1992. Templeton, A. R. and C. F. Sing. A Cladistic Analysis of Phenotypic Associations With Haplotypes Inferred From Restriction Endonuclease Mapping. IV. Nested Analyses with Cladogram Uncertainty and Recombination. Genetics 134: 659-669, 1993. Templeton A. R., Clark A. G., Weiss K. M., Nickerson D. A., Boerwinkle E. and C. F. Sing. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am J Hum Genet. 66: 69-83, 2000). These analyses describe a set of rules for clustering haplotypes into hierarchical groups based on their presumed evolutionary relatedness. This phylogenetic trees can be constructed using standard software packages for phylogenetic analysis such as PHYLIP or PAUP (Felsenstein, J. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet. 22:521-65, 1988; Retief, J. D. Phylogenetic analysis using PHYLIP. Methods Mol Biol. 132:243-58, 2000), and hierarchical haplotype clustering can be accomplished using the rules described by Templeton and co-workers. The methods described by Templeton and colleagues further provide for a nested analysis of variance between different haplotype groups at each level of clustering. The results of this analysis can lead to identification of polymorphic sites responsible for phenotypic variation, or at a minimum narrow the possible phenotypically important sites. Thus, methods for determination of haplotypes have great utility in studies designed to test association between genetic variation and variation in phenotypes of medical interest, such as disease risk and prognosis and response to therapy.
Currently available methods for the experimental determination of haplotypes are unsatisfactory, particularly methods for the determination of haplotypes over long distances (e.g.  greater than 5 kb). One of the few experimental haplotyping methods currently in use outside the research group that devised it is based on allele specific amplification using oligonucleotide primers that terminate at polymorphic sites (Newton, C. R. et al. Amplification refractory mutation system for prenatal diagnosis and carrier assessment in cystic fibrosis. Lancet. December 23-30; 2 (8678-8679):1481-3, 1989; Newton, C. R. et al., Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS) Nucleic Acids Res. Vol. 17, 2503-2516, 1989). The method is referred to by the acronym ARMS (for amplification refractory mutation system). The ARMS system was subsequently further developed (Lo, Y. M. et al., Direct haplotype determination by double ARMS: specificity, sensitivity and genetic applications. Nucleic Acids Research July 11;19 (13):3561-7, 1991) and has since been used in a number of other studies. ARMS is the subject of U.S. Pat. Nos. 5,595,890 and 5,853,989. The drawbacks of this method are that (i) the usual limitations of PCR apply in terms of the difficulty of amplifying long DNA segments; (ii) during amplification cycles, an incompletely extended primer extension product may switch (between one or more cycles) from one allelic template strand to the other, resulting in artefactual hybrid haplotypes; (iii) because different DNA samples will be heterozygous at different combinations of nucleotides, different primers and assay conditions for allele specific amplification must be established for each polymorphic site that is to be haplotyped. For example, consider a locus with five polymorphic sites. Subject A is heterozygours at sites 1, 2 and 4; subject B at sites 2 and 3, and subject C at sites 3 and 5. To haplotype A requires allele specific amplification conditions from sites 1 or 4; to haplotype B requires allele specific amplification conditions from sites 2 or 3, and to haplotype C requires allele specific amplification conditions from sites 3 or 5 (with the allele specific primer from site 3 on the opposite strand from that used to haplotype B).
A similar method for achieving allele specific amplification takes advantage of some thermostable polymerases"" ability to proofread and remove a mismatch at the 3xe2x80x2 end of a primer. Again, primers are designed with the 3xe2x80x2 terminal base positioned opposite to the variant base in the template. In this case the 3xe2x80x2 base of the primer is modified in a way that prevents it from being extended by the 5xe2x80x2-3xe2x80x2 polymerase activity of a DNA polymerase. Upon hybridization of the end-blocked primer to the complementary template sequence, the 3xe2x80x2 base is either matched or mismatched, depending on which alleles are present in the sample. If the 3xe2x80x2 base of the primer is properly base paired the polymerase does not remove it from the primer and thus the blocked 3xe2x80x2 end remains intact and the primer can not be extended. However, if there is a mismatch between the 3xe2x80x2 end of the primer and the template, then the 3xe2x80x2-5xe2x80x2 proofreading activity of the polymerase removes the blocked base and then the primer can be extended and amplification occurs. This method suffers from the same limitations described above for the ARMS procedure.
Other allele specific PCR amplification methods include further methods in which the 3xe2x80x2 terminal primer forms a match with one allele and a mismatch with the other allele (U.S. Pat. No. 5,639,611), PCR amplification and analysis of intron sequences (U.S. Pat. Nos. 5,612,179 and 5,789,568), or amplification and identification of polymorphic markers in a chromosomal region of DNA (U.S. Pat. No. 5,851,762). Further, methods for allele-specific reverse transcription and PCR amplification to detect mutations (U.S. Pat. No. 5,804,383), and a primer-specific and mispair extension assay to detect mutations or polymorphisms (PCT/CA99/00733) have been described. Several of these methods are directed to genotyping, not to haplotyping.
Other haplotyping methods that have been described are based on analysis of single sperm cells (Hubert R., Stanton, V. P. Jr, Aburatani H, et al. Sperm typing allows accurate measurement of the recombination fraction between D3S2 and D3S3 on the short arm of human chromosome 3. Genomics. April 1992;12(4):683-687); on limiting dilution of a DNA sample (until only one template molecule is present in each test tube, on average) (Ruano, G., Kidd, K. K. and J. C. Stephens. Haplotype of multiple polymorphisms resolved by enzymatic amplification of single DNA molecules. Proc Natl Acad Sci U S A August 1990;87(16):6296-6300), or on cloning DNA into various vectors and host microorganisms (U.S. Pat. No. 5,972,614). These methods are not practical for clinical studies of human subjects, and generally have not been used in studies of human disease risk or drug response. For example, sperm based haplotyping methods are not generally useful for clinical studies because no sperm has the same haplotype as its host. Limiting dilution methods are technically challengingxe2x80x94two rounds of PCR amplification are required, with stringent controls for preventing contamination by exogenous DNAxe2x80x94and not compatible with the high throughput, accuracy and reliability required in human clinical studies.
This invention concerns methods for determining the sequence of a DNA sample at a polymorphic site, often referred to as genotyping. Many genotyping methods are known in the art, however the methods described in this application have the advantages of being robust, highly accurate, and inexpensive to set up and perform. For these reasons the methods described herein are preferable to currently available methods. The genotyping methods described in the specification may be used in the genotyping steps of the haplotyping methods of this invention, or they may be used for genotyping alone, i.e. not associated with a haplotyping test.
The present invention also concerns methods for determining the organization of DNA sequence polymorphisms on individual chromosomesxe2x80x94i.e. haplotypes, as well as methods for using either genotype or haplotype information, or a combination of the two, to make diagnostic tests useful for disease risk assessment, for prognostic prediction of the course or outcome of a disease, to diagnose a disease or condition, or to select optimal therapy for a disease or condition. As described above, haplotypes are often not directly inferrable from genotypes, therefore specialized methods are required to determine haplotypes. Further, as noted, currently available haplotyping methods are cumbersome and/or are limited by the type of samples that can be analyzed. The several haplotyping methods of this invention are superior to previously described methods with respect to technical ease, sample throughput, length of DNA that can be haplotyped, and compatibility with automation. These novel methods provide the basis for more sophisticated analyses of the contribution of variation at candidate genes (such as ApoE) to intersubject variation in medical or other phenotypes of interest. These methods are applicable to patients with a disease or disorder as well as to apparently normal subjects in whom a predisposition to a disease or disorder may be discovered or quantified as a result of a haplotyping test described herein. Application of the haplotyping methods of this invention will provide for improved medical care by increasing the accuracy of genetic diagnostic tests of all kinds.
This invention further concerns genetic analysis of the Apo E gene to determine disease and drug response traits in humans, particularly traits that may be affected by genetic variation at the ApoE gene, and further concerns methods for improving medical care for individual patients based on the results of ApoE genetic testing. Variation at the ApoE gene has been associated with risk of Alzheimer""s disease and other neurodegenerative diseases, recovery from organic or traumatic brain injury, and response to pharmacotherapy of AD as well as coronary heart disease, dyslipidemia, and other conditions. The methods of this application also provide for more efficient use of medical resources, and therefore are also of use to organizations that pay for health care, such as managed care organizations, health insurance companies and the federal government. The invention provides methods for performing genotyping and haplotyping tests on a human subject to formulate or assist in the formulation of a diagnosis, a prognosis or the selection of an optimal treatment method based on ApoE genotype or haplotype. These methods are applicable to patients with a disease or disorder affecting the cardiovascular or nervous systems, as well as patients with any disease or disorder that is affected by lipid metabolism. The ApoE haplotyping methods of this invention are equally applicable to apparently normal subjects in whom predisposition to a disease or disorder may be discovered as a result of an ApoE genotyping or haplotyping test described herein. Application of the methods of this invention will provide for improved medical care by, for example, allowing early implementation of preventive measures in patients at risk of diseases such as atherosclerosis, dementia, Parkinson""s disease, Huntington""s disease or other organic or vascular neurodegenerative process; or optimal selection of therapy for patients with diseases or conditions such as hyperlipidemia, cardiovascular disease (including coronary heart disease as well as peripheral or central nervous system atherosclerosis), neurological diseases including but not limited to Alzheimer""s disease, stroke, head or brain trauma, amyotrophic lateral sclerosis, and psychiatric diseases such as psychosis, bipolar disease and depression.
Genotyping Methods
The disadvantages of existing genotyping methods include unproven or inadequate accuracy (particularly for medical research or clinical practice, where very high accuracy is required), high set up costs (which are unacceptable when relatively small numbers of subjects are being studiedxe2x80x94e.g. in the clinical research setting), technical difficulty in performing the test or interpreting the results, and incompatibility with full automation.
Methods described in the present invention first use amplification (preferably PCR amplification) using amplification oligonucleotides (primers) flanking a polymorphic site. The 3xe2x80x2 end of one of the primers is close, highly preferably within 16 nucloetides, of a polymorphic site in template DNA. The second primer may lie at any distance from the first primer on the opposite side of the polymorphic site providing effective amplification. The first primer is designed so that it introduces two restriction endonuclease recognition sites into the amplified product during the amplification process. Preferably the two restriction sites are created by inserting a sequence of 15 or fewer nucleotides into the primer. This short inserted sequence in general does not base pair to the template strand, but rather loops out when the primer is bound to template. However, when the complementary strand is copied by polymerase the inserted sequence is incorporated into the amplicon. Incubation of the resulting amplification product with the appropriate restriction endonucleases results in the excision of a small (generally  less than 20 bases) polynucleotide fragment that contains the polymorphic nucleotide. The small size of the excised fragment allows it to be easily and robustly analyzed by mass spectrometry to determine the identity of the base at the polymorphic site. The primer with the restriction sites can be designed so that the restriction enzymes: (i) are easy to produce, or inexpensive to obtain commercially, (ii) cleave efficiently in the same buffer, i.e. all potential cleavable amplicons are fully cleaved in one step, (iii) cleave multiple different amplicons, so as to facilitate multiplex analysis (that is, the analysis of two or more samples simultaneously).
An enhancement of the basic method is to select a combination of restriction enzymes that will cleave the amplified product so as to produce staggered ends with a 5xe2x80x2 extension, such that the polymorphic site is contained in the extension. Elimination of natural nucleotides from the reaction (for example using Shrimp Alkaline Phosphatase or other alkaline phosphatase) and addition of at least one modified nucleotide corresponding to one of the two nucleotides present at the polymorphic site (for example 5xe2x80x2-bromodeoxyuridine if T is one of the two polymorphic mucleotides) will result in fill-in of the recessed 3xe2x80x2 end to produce fragments differing in mass by more than the natural mass difference of the two polymorphic nucleotides. One or more modified nucleotides can be selected to maximize the differential mass of the two allelic fill-in products. This enhancement of the basic method has the advantage of reducing the mass spectrometric resolution required to reliably determine the presence of two alleles vs. one allele, thereby improving the performance of base-calling software and the ease with which a genotyping system can be automated.
Another modification of the basic system is to use a third restriction enzyme that cleaves only one of the two alleles, such that the presence of the site yields shorter fragments than are observed in its absence. Such a modification is not universally applicable because not all polymorphisms alter restriction sites, however this limitation can be partially addressed by including part of the restriction enzyme recognition site in the primer. For example, an interrupted pallindrome recognition site like Mwo I (GCNNNNN/NNGC) can be positioned such that the first GC is in the primer while the second GC includes the polymorphic nucleotide. Only the allele corresponding to GC at the second site will be cleaved. Use of such restriction endonucleases simplifies the sequence requirements at and about the polymorphic site (in this example all that is required is that one allele at the polymorphic site include the dinucleotide GC), thereby increasing the number of polymorphic sites that can be analyzed in this way.
In additional aspects, the invention provides methods that are applicable to both genotyping and haplotyping. The methods use biased amplification of nucleic acid sequences that include variance sites,and utilize primers that are designed so that a hairpin loop will form, generally in the complementary strand formed in an amplification reaction. The primer is designed to have a mismatch in its 5xe2x80x2 end to a particular nucleotide at a particular site, generally a polymorphic site in a gene. If the particular nucleotide is present at the site, then amplification will be inhibited because the complementary strand formed in the amplification reaction will form a sufficiently stable hairpin loop to effectively compete with binding of the primer, and so inhibit further amplification. In contrast, a variant sequence with a different nucleotide at that site will not form a sufficiently stable hairpin to effectively compete with primer binding.
Thus, in one aspect, the invention provides a method for biasing the amplification of one allele (e.g., one form of a SNP at a particular site). As explained above, the biasing depends on the identity of a specific nucleotide at a polymorphic site in a target nucleic acid sample. The method involves contacting a segment of DNA with two primers encompassing the polymorphic site under amplification conditions. One primer contains a region at its 5xe2x80x2 end that is not complementary to the target nucleic acid but which, when incorporated into the amplification product, will cause the 3xe2x80x2 end of the strand complementary to this primer in the amplification product to form a sufficiently stable hairpin loop by hybridizing with the sequence including the polymorphic site to inhibit further amplification only if the specific nucleotide is present at the polymorphic site. The method also involves determining whether the segment is amplified. Amplification (or preferential amplification) of the segment is indicative that the polymorphic site contains an alternative to the specific nucleotide.
In particular embodiments, the nucleic acid sample can be single stranded DNA or double stranded DNA, and can be genomic or cDNA. RNA can also be utilized, preferably by forming cDNA. In certain embodiments, the amplification of the segment is detected by detection of the presence of defined size fragments following restriction enzyme digestion of any amplification products. The polymorphic site can be a restriction fragment length polymporphism (RFLP), and a digestion can be performed with a restriction enzyme corresponding to the RFLP, where the defined size fragments differ in size depending on the nucleotide present at the polymorphic site.
The method is not restricted to a single site, so in preferred embodiments, the method involves carrying out the contacting and determining for each of a plurality of different polymorphic sites. For example, at least 2, 3, 4, 5, 6, 8, 10, 15, 20, 30, 40, 50, or 100 sites can be analyzed in a coordinated set of determinations (e.g., in genotyping an individual for a plurality of different sites, which may be in one or a plurality of different genes). In certain embodiments, the plurality of different polymorphic sites provides a haplotype for a gene, can independently or also include at least one polymorphic site in a plurality of different genes, and/or provide haplotypes for a plurality of different genes.
Such biased amplification can be used to determine the nucleotide present at a particular polymorphic site. Thus, in a related aspect, the invention provides a method for determining whether a particular nucleotide is present at a polymorphic site in a target nucleic acid sequence, by contacting a segment of DNA containing the polymorphic site with a primer under amplification conditions, such that extension products and/or amplification products will be formed. The primer has a sequence at its 5xe2x80x2 end that is the same as a sequence including the polymorphic site for a particular nucleotide present at that site. The opposite strand extension product or amplification product will form a sufficiently stable hairpin loop by hybridization between a sequence including the polymorphic site and a sequence derived from the 5xe2x80x2 end of the primer for a specific nucleotide at the polymorphic site site to inhibit amplification. Amplification is not inhibited for an alternative nucleotide at said site. The method also includes determining whether the segment is amplified. Amplification of the segment indicates that the polymorphic site contains an alternative nucleotide instead of the specific nucleotide. In general, a second primer, consituting a primer pair, is also used under amplification conditions such that extension products or amplification products or both will be formed. Particular embodiments include those as described for the aspect above.
Haplotyping Methods
This invention concerns methods for determining the sequence of individual chromosomes, starting with diploid DNA that contains two chromosomes, and methods for using that information to make genetic tests useful for disease risk assessment, for diagnosing a disease or condition, for assessing disease prognosis or to select optimal therapy for a disease or condition. The sequence of a chromosome segment is referred to as a haplotype. Since homologous chromosome segments (e.g. the sequence of two alleles of the ApoE gene) are very similar in sequence ( greater than 99%) the distinguishing elements of haplotypes occur at polymorphic sites. A haplotype can be thought of as the nucleotide sequence of a DNA segment at some or all of the sites that vary in a population. Thus a haplotype may consist in specifying the sequence at 10 polymorphic sites in a 5,000 nucleotide DNA segment.
The pattern of genetic variation in most species, including man, is not random; as a result of human evolutionary history some sets of polymorphisms occur together on chromosomes, so that knowing the sequence of one polymorphic site may allow one to predict with some probability the sequence of certain other sites on the same chromosome. Once the relationships between a set of polymorphic sites have been worked out, a subset of all the polymorphic sites may be used in the development of a haplotyping test. In preferred embodiments of the haplotyping methods of this invention, a subset of all the polymorphic sites at a locus is used to develop a haplotyping test. The polymorphisms that comprise a haplotype may be of any type.
Most polymorphisms (about 90% of all DNA polymorphisms) involve the substitution of one nucleotide for another, and are referred to as single nucleotide polymorphisms (SNPs). The other main type of polymorphism involves change in the length of a DNA segment as a result of an insertion or deletion of anywhere from one nucleotide to thousands of nucleotides. Insertion/deletion polymorphisms (also referred to as indels) account for most non-SNP polymorphisms. Common kinds of indels include variation in the length of homopolymeric sequences (e.g. AAAAAA vs. AAAAA), variation in the number of short tandem repeat sequences such as CA (e.g. 13 repeats of CA vs. 15 repeats), and variation in the number of more complex repeated sequences (sometimes referred to as VNTR polymorphisms, for variable number of tandem repeats), as well as any other type of inter-individual variation in the length of a given DNA segment. The repeat units may also vary in sequence.
Haplotypes are often not directly inferrable from genotypes (except in the special case of families, where haplotypes can often be inferred by analysis of pedigrees), therefore specialized methods are required for determining haplotypes from samples derived from unrelated subjects. Currently available haplotyping methods are cumbersome and expensive and limited either by the type of samples that can be analyzed (e.g. sperm cells) or by the limitations of PCR or other DNA amplification methods. The limits of DNA amplification methods such as PCR include incomplete allele-specificity of priming when using a 3xe2x80x2 terminal primer mismatch to achieve allele discrimination (such as in the ARMS method); that is, there may be some amplification of the non-selected allele. PCR is also limited in the length of DNA segment that can be amplified.
The present application provides methods for determining the haplotypes present in a DNA sample or cDNA sample preferably drawn from one subject, however these methods may also be used to determine the population of haplotypes present in a complex mixture, such as may be produced by mixing DNA samples from multiple subjects. The methods described herein are applicable to genetic analysis of any diploid organism, or any polyploid organism in which there are only two unique alleles. Application of the methods of this invention will provide for improved genetic analysis, enabling advances in medicine, agriculture and animal breeding. For example, by improving the accuracy of genetic tests for diagnosing predisposition to disease, or for predicting response to medical therapy, it will be possible to make safer and more efficient use of appropriate preventive or therapeutic measures in patients. The methods of this invention also provide for improved genetic analysis in a variety of basic research problems, including the identification of alleles of human genes that are associated with disease risk or disease prognosis.
Certain methods for determining haplotypes present in a DNA sample from a diploid organism include the following steps: (i) genotyping at least a portion of (meaning a sequence portion) the sample to identify sites of heterozygosity; (ii) enriching for an allele by a method not requiring amplification to a ratio of at least 1.5:1 based on a starting ratio of 1:1, where the information from (i) is used to select a preferred or optimal heterozygous site or sites for allele enrichment; (iii) genotyping the enriched material to determine the nucleotides present at said heterozygous site or sites; and (iv) determining the haplotype of the enriched allele by inspecting the genotypes from (iii). This method may further include determining the haplotype of the non-enriched allele by comparing the genotype determined in step (i) of with the haplotype determined in step (iv). Such a haplotyping method as described above may include additional steps including (a) performing an allele enrichment procedure for the second allele on the same starting material and (b) genotyping the enriched material for the second allele to determine the nucleotides present at said heterozygous site or sites; and (c) determining the haplotype of the enriched second allele by inspecting the genotypes from (b).
Additional methods for determining the haplotypes present in DNA from a diploid organism, include the following steps: (i) genotyping at least a portion of the DNA in a sample from said organism to identify sites of heterozygosity; (ii) performing an allele-selective amplification procedure on the sample such that the allele ratio is changed from a starting ratio of 1:1 to at least 1.5:1, wherein the information from (i) is used to select an optimal polymorphic site or sites for designing primers to achieve said allele-selective amplification; (iii) genotyping the selectively amplified material; and (iv) determining the haplotype of the selectively amplified allele by inspecting the genotypes. Methods may include further determination of the haplotype of the selectively non-amplified allele by comparing the genotype determined in (i) with the haplotype determined in (iv). In addition, methods may include determining the haplotype of the selectively non-amplified allele by (a) performing an allele-selective amplification procedure for the second allele using the same starting material; (b) genotyping the selectively amplified second allele material; and (c) determining the haplotype of the selectively amplified second allele by inspecting the genotypes.
Also, methods for determining the haplotypes present in DNA from a diploid organism, include (i) genotyping at least a portion of a DNA sample from said organism to identify sites of heterozygosity that affect restriction enzyme cleavage sites; (ii) restriction endonuclease digesting the DNA, using natural or synthetic endonucleases, such that one allele is restricted at a specific site and the other is not; (iii) performing an amplification procedure on the sample, using the information from step (i) to select optimal sites for designing primers to achieve allele-selective amplification; (iii) genotyping the selectively amplified material; and (iv) determining the haplotype of the selectively amplified allele by inspecting the genotypes. These haplotyping methods further include determining the haplotype of the selectively non-amplified allele by comparing the genotype determined in step (i) with the haplotype determined in step (iv). In addition, methods may include (a) isolating the second allele utilizing size difference; (b) genotyping the size selected material corresponding to the second allele; and (c) determining the haplotype of the size-selected second allele by inspecting the genotypes.
Still further methods for determining the haplotypes present in DNA from a diploid organism include the steps of (i) genotyping at least a portion of the DNA from the sample to identify sites of heterozygosity that affect restriction enzyme cleavage sites; (ii) restriction endonuclease digesting the DNA, using natural or synthetic endonucleases, such that only one allele is restricted at a specific polymorphic site, thereby creating partially overlapping allele 1 and allele 2 fragments of different length, wherein information from (i) is utilized to select a restriction site that produces a useful difference in allele length; (iii) separating the restricted molecules according to their size by electrophoresis or centrifugation, such that the two allelic restriction fragments are resolved; isolate DNA molecules corresponding to the size of allele 1 and, optionally, allele 2; (iv) genotyping the size selected material corresponding to allele 1 and optionally allele 2; and (v) determining the haplotype of the size-selected allele 1 by inspecting the genotypes. These methods may include determination of the haplotype of allele 2 by comparing the genotypes determined in (i) with the haplotype determined in (v).
Additional embodiments of methods for haplotyping double stranded DNA fragments include (i) genotyping at least a portion of a DNA sample to identify sites of heterozygosity in the DNA fragment of interest; (ii) immobilizing double stranded DNA fragments on a solid support; (iii) adding two or more components that bind at polymorphic sites in the immobilized DNA fragment of interest to produce detectable structure under conditions that promote preferential binding to only one strand of the target immobilized fragment; and (iv) determining the location of target fragments. These methods may further include two or more components which are two or more oligonucleotides complementary to polymorphic sites in the aforementioned immobilized DNA fragment of interest. The components are added under conditions that promote D loop formation in the case of oligonucleotides perfectly matched to one strand of the target immobilized fragment, but not in the case of oligonucleotides containing one or more mismatched nucleotides. The formation of D loops may be enhanced by the addition of RecA protein or alternatively by the alteration of salt concentration within the mixture. The two or more components may further include two or more peptide nucleic acids (PNA) or two or more zinc finger proteins. In methods including PNA, the peptide nucleoc acids are complementary to polymorphic sites in the immobilized DNA fragment of interest, and are added under conditions that promote D loop formation in the case of PNAs perfectly matched to one strand of the target immobilized fragment, but not in the case of PNAs containing one or more mismatched nucleotides. In methods including zinc finger proteins, the proteins that can bind to one of two alleles at a polymorphic nucleotide may be used and are added as described for the oligonucleotide components. The two or more zinc finger proteins can be detectably labeled. The immobilized target DNA fragments may be first subjected to a size selection procedure and or immobilized to a prepared glass surface. These methods may then be used to determine the location of the target fragments by optical mapping. In this more specific method for detection, two or more oligonucleotides are detectably labeled.
Further embodiments of a method for determining the haplotypes of DNA fragments present in a DNA sample from a diploid organism including: a) selectively amplifying one haplotype from the mixture by the allele specific clamp PCR procedure; and b) determining the genotype of two or more polymorphic sites in the amplified DNA fragment. The selective amplification may be preceded by determining the genotype of the DNA sample at two or more polymorphic sites in order to devise an optimal genotyping and that the DNA sample is a mixture of several DNA samples.
Additional haplotyping methods and embodiments of this invention are described in the Detailed Description below.
APOE Genotyping and Haplotyping
Several United States patents relate to methods for determining ApoE haplotype and using that information to predict whether a patient is likely to develop late onset type Alzheimer""s Disease (U.S. Pat. Nos. 5,508,167, 5,716,828), whether a patient with cognitive impairment is likely to respond to a cholinomimetic drug (U.S. Pat. No. 5,935,781), or whether a patient with a non-Alzheimer""s neurological disease is likely to respond to therapy (U.S. Pat. No. 5,508,167).
The ApoE test practiced in all the cited patents (and virtually all the other publications), is based on a classification of Apo E into three alleles, termed epsilon 2, epsilon 3 and epsilon 4 (and abbreviated e2, e3 and e4). These three alleles are distinguishable on the basis of two polymorphic sites in the ApoE gene. The status of both sites must be tested to determine the alleles present in a subject. The two polymorphic sites are at nucleotides 448 and 586 of the ApoE cDNA (numbering from GenBank accession K00396), corresponding to amino acids 112 and 158 of the processed ApoE protein. The nucleotide polymorphism at both sites is T vs. C, and at both sites it is associated with a cysteine vs. arginine amino acid polymorphism, wherein the codon with T encodes cysteine and the codon with C encodes arginine. The presence of T at both polymorphic sites (cysteine at both residues 112 and 158) is designated e2; T at position 448 and C at position 586 (cysteine at 112, arginine at 158) is designated e3, and C at both variable sites (arginine at both 112 and 158) is designated e4. These three alleles (as well as rarer alleles) occur in virtually all human populations, with the frequency of the alleles varying from population to population. The e3 allele is commonest all populations, while the frequency of e2 and e4 varies. Numerous studies have demonstrated association between ApoE alleles and risk of various diseases or biochemical abnormalities. For example the e4 allele is associated with risk of late onset Alzheimer""s disease and elevated serum cholesterol.
It has been apparent for several years that the e2, e3, e4 classification does not provide sufficient sensitivity or specificity to be used alone as a diagnostic test for assessing risk of or making a diagnosis of either dyslipidemia, heart disease or Alzheimer""s disease (AD) in asymptomatic individuals. Even the use of ApoE testing as a tool in the differential diagnosis of dementia (e.g. to increase the certainty of a clinical diagnosis of Alzheimer""s type dementia in a patient with early signs of dementia in whom the diagnosis of Alzheimer""s is being considered) is debated. Thus, while many important associations between ApoE genotype and medically important conditions or treatment responses have been described and repeatedly confirmed, it is evident that the strength of these associations is not as great as would be desirable for a routine predictive, diagnostic or prognostic test, and in fact may not be sufficient to justify ApoE genetic testing for any non-research purpose.
The lack of sensitivity and specificity that limits the use of current ApoE genotype tests is likely attributable to two factors. First, the current ApoE test may not measure all the functional variation in the ApoE gene. For example, it does not take full account of any genetically determined variation in transcription regulation; variation in RNA processingxe2x80x94including splicing, polyadenylation and export to the cytoplasm; variation in mRNA translational efficiency and half life, as well as variation in protein activity including receptor binding, interaction with regulatory factors, half life, etc.. This is true particularly insofar as such variation may be determined by polymorphisms other than those that account for the e2, e3, e4 classification. Second, there may be variables besides ApoE allele status that affect the various conditions for which ApoE genotyping has been tested. Other relevant variables for neurodegenerative diseases such as AD include variation in the genes that encode protein components of AD lesions, such as tau protein or amyloid precursor protein; the proteases that produce pathological forms of these proteins, such as beta and gamma secretase and the memapsins; AD disease genes such as presenilin 1 and 2; genes involved in brain inflammatory response pathways, and other groups of genes implicated in neurodegeneration by biochemical, genetic or epidemiological evidence. Variables that may interact with ApoE genotype or haplotype to affect cholesterol and triglyceride levels and heart disease risk include the genes encoding ApoE receptors (low density lipoprotein receptor, and the low density lipoprotein receptor related protein), and genes encoding other apolipoproteins and their receptors, as well as the genes of cholesterol biosynthesis, including hydroxymethylglutaryl CoA reductase, mevalonate synthetase, mevalonate kinase, phosphomevalonate kinase, squalene synthase and other enzymes.
The present invention addresses the first limitation of current ApoE testing (failure of current ApoE tests to record all the alleles of ApoE that have distinct biochemical or clinical effects) by providing for a much more sensitive test of ApoE variation. Specifically, we describe 20 DNA polymorphisms in and around the ApoE gene (including the two polymorphisms that are traditionally studied). We also describe the commonly occuring haplotypes at the ApoE locusxe2x80x94that is, the sets of polymorphic nucleotides that occur together on individual chromosomesxe2x80x94and novel methods for determining haplotypes in clinical samples. Also described are data analysis strategies for extracting the maximum information from the ApoE haplotypes, so as to enhance their utility in clinical settings.
The ApoE haplotypes include any haplotype that can be assembled from the sequence polymorphisms described herein in Table 2, or any subset of those polymorphisms. Thus, the invention expressly includes a haplotype including either of the alternative nucleotides at any 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 of the identified polymorphic sites. The haplotypes expressly include each combination of sites with each selection of alternative nucleotide at each site included in the haplotype. The haplotypes may also include one or more additional polymorphic sites which are known in the art or which may be identified in the future. Among the haplotypes described below are a set of haplotypes that parallel the current e2, e3, e4 classification but do not involve either of the nucleotides that specify the e2, e3, e4 system.
The present invention also addresses the second potential limitation of current ApoE testingxe2x80x94failure to test for the interaction of ApoE genotype or haplotype with other genetic determinants of nervous system disease or cardiovascular disease risk, prognosis or response to therapy. The phenotypes for which ApoE genotyping or haplotyping have been tested are determined by multiple genes, and therefore require the simultaneous analysis of variation in two or more genetic loci. The haplotyping methods of this application facilitate such analysis by providing a basis for (i) identifying substantially all haplotypes that exist at appreciable frequency in a population or populations, (ii) clustering said haplotypes in groups of two or more haplotypes to facilitate statistical analysis, thereby increasing the power of association studies.
As used herein, xe2x80x9cpopulationxe2x80x9d refers to a group of individuals that share geographic (including, but not limited to, national), ethnic or racial heritage. A population may also comprise individuals with a particular disease or condition (xe2x80x9cdisease populationxe2x80x9d). The concept of a population is useful because the occurance and/or frequency of DNA polymorphisms and haplotypes, as well as their medical implications, often differs between populations. Therefore knowing the population to which a subject belongs may be useful in interpreting the health consequences of having specific haplotypes. A population preferably encompasses at least ten thousand, one hundred thousand, one million or more individuals, with the larger numbers being more preferable. In embodiments of this invention, the allele (haplotype) frequency, heterozygote frequency, or homozygote frequency of a two or more alleles of a gene or genes is known in a population. In preferred embodiments of this invention, the frequency of one or more variances that may predict response to a treatment is determined in one or more populations using a diagnostic test.
In one aspect, the invention provides a method for determining a genotype for ApoE in an individual, comprising determining the nucleotide present at least one polymorphic site different from nucleotides 21250, and 21388 in an ApoE allele from an individual. In preferred embodiments, the polymorphic site is selected from the group consisting of nucleotides 16541, 16747, 16965, 17030, 17098, 17387, 17785, 17874, 17937, 18145, 18476, 19311, 20234, 21349, 23524, 23707, 23759, 23805, and 37237. In certain embodiments, the method also comprises determining the nucleotide present at at least one of nucleotides 21250 and 21388. The determining is performed by a method comprising variance specific nucleic acid hybridization. The variance specific nucleic acid hybridization can be performed on an array, preferably an array composed of immobilized oligonucleotides or in situ synthesized oligonucleotides and the hybridizing species are DNA fragments. In certain embodiments, the DNA fragments are PCR amplification products. In some embodiments, the array is composed of immobilized DNA fragments and the hybridization species are oligonucleotides.
Determining the nucleotide present at a polymorphic site can be performed using a primer extension method distinguishing between nucleotides present at said at least one site, for example, as method using dideoxynucleotides to effect nucleic acid chain termination. That determining can alternatively be performed using a method involving chemical cleavage of a nucleic acid molecule including a said polymorphic site. The nucleic acid fragment masses following said chemical cleavage is preferably determined using mass spectrometry.
In other embodiments, determining the nucleotide present at a polymorphic site is performed using an cleavase based signal amplification method.
The nucleotide determination can also be performed using a bead-based method, preferably where the beads have a bound oligonucleotide species which is perfectly matched or one base mismatched to the target.
Again alternatively, the determining can be performed using a FRET-based method.
In another aspect, the invention provides a method for determining a haplotype for ApoE in an individual, by genotyping at least two polymorphic sites in ApoE sequence on at least one allele of said individual, preferably where at least one of said polymorphic sites is different from nucleotides 21250 and 21388. As in the preceding aspect, in preferred embodiments, the polymorphic sites include at least one site selected from the group consisting of nucleotides 16541, 16747, 16965, 17030, 17098, 17387, 17785, 17874, 17937, 18145, 18476, 19311, 20234, 23524, 23707, 21349, 23759, 23805, and 37237.
In preferred embodiments, the genotyping is performed on two alleles of said individual.
In preferred embodiments, the genotyping is performed for at least 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19, or 20 of the polymorphic sites.
Embodiments of the preceding two aspects can also be applied in connection with additional aspects, particularly aspects concerning ApoE described herein.
The invention also provides a method for classifying ApoE haplotypes for a plurality of individuals, by determining at least one ApoE haplotype for each of the plurality of individuals, determining the sequence similarity of the haplotypes (using methods for determining sequence similarity as known to those of ordinary skill in the art, and assigning the haplotypes to groups of haplotypes based on said sequence similarities. This method thus constructs groups of related ApoE haplotypes based on sequence relationship.
Further, the invention provides a method for providing an indication of the risk for an individual to develop a disease or condition, by determining a haplotype of ApoE in the individual, where the haplotype provides a measure of the risk.
In preferred embodiments of this aspect and other aspects relating to ApoE and a disease, the disease is selected from the group consisting of coronary heart disease, a non-Alzheimer""s Disease neurological disease, Alzheimer""s disease, stroke, brain trauma, amyotrophic lateral sclerosis, temporal lobe epilepsy, Wilson""s disease, continuous ambulatory peritoneal dialysis, glycogen storage disease type Ia, and age-related macular degeneration.
The method (and other methods described herein relating to ApoE and disease) can also include determining a genotype or haplotype of at least one additional gene, where the haplotype of ApoE together with the genotype or haplotype of the additional gene(s) provides a measure of the risk.
The invention provides a method for diagnosing the presence of a disease in an individual, by determining whether the individual has an ApoE haplotype associated with the disease.
In preferred embodiments, the method also includes determining a genotype or haplotype of at least one additional gene and determining whether the individual has a combination of the haplotype of ApoE and the genotype or haplotype of the at least one additional gene associated with the disease.
Likewise, the invention provides a method for predicting the clinical course for a patient suffering from a disease, by determining an ApoE haplotype for the individual, where at least one ApoE haplotype is associated with the clinical course of the disease.
In preferred embodiments, the clinical course comprises a treatment prognosis for a particular method of treatment, the clinical course comprises at least one clinical disease parameter selected from the group consisting of rate of disease development, time interval to death, time interval to dementia, and time interval to inability to live independently.
The invention also provides a method for selecting a subject for prophylactic treatment of a disease, by identifying a subject having an ApoE haplotype associated with an elevated risk of developing the disease, wherein said prophylactic treatment can provide a clinical benefit to a the subject.
The invention also provides a method for selecting a patient for treatment of a disease, involving determining whether the patient has an ApoE haplotype associated with favorable clinical prognosis with a particular treatment.
Similarly, the invention provides a method for selection of a treatment for a patient suffering from a disease. The method involves determining an ApoE haplotype for the patient; and identifying a treatment associated with favorable clinical prognosis for a patient having that ApoE haplotype.
As ApoE haplotype is associated with treatment selection and prognosis, the invention also provides a method of treating a patient suffering from a disease, by determining an ApoE haplotype for the patient, identifying a treatment associated with favorable clinical prognosis for a patient having that ApoE haplotype, and administering that treatment to the patient.
ApoE haplotype and genotype information also can be utilized in identifying individuals, or the individual source of a biological sample. Thus, the invention provides a method for determining whether a biological sample was from an individual, by determining the nucleotides present at a plurality of ApoE polymorphic sites in the individual and in DNA obtained from the sample, and determining whether the nucleotides present at the polymorphic sites are the same or different. The presence of the same nucleotides at respective sites is indicative that said sample is from said individual, and the presence of different nucleotides is indicative that said sample is not from said individual. The ApoE genotype or haplotype information can also be usefully combined with similar information for polymorphic sites in other genes or other nucleic acid sequences from the individual and the sample. In preferred embodiments, the plurality of ApoE polymorphic sites comprises an ApoE haplotype.
The invention also provides a method for determining whether an ApoE haplotype is associated with a disease risk. This method involves determining ApoE haplotypes for each individual in a set of individuals, dividing the set of individuals into at least two groups based on ApoE haplotypes, and determining whether individuals having a particular ApoE haplotype or individuals in a group differ from individuals having a different ApoE haplotype or in a different group in incidence, prevalance, severity, or progression or a combination thereof, of disease. This aspect can also be combined with embodiments of other aspects described herein involving ApoE and disease, disease treatment and other such aspects.
The invention also provides a method for determining whether a combination of an ApoE haplotype and a genotype or haplotype of at least one additional gene is associated with a disease risk. The method includes determining ApoE haplotypes and genotypes or haplotypes for the at least one additional gene for each individual in a set of individuals, dividing the set of individuals into at least two groups based on the combinations of ApoE haplotypes and genotype or haplotype of said at least one additional gene, and determining whether individuals having a particular combination or individuals in a group differ from individuals having a different combination or in a different group, in incidence, prevalance, severity, or progression or a combination thereof, of said disease.
The invention further provides a method for determining whether an ApoE haplotype is associated with a pharmacologic parameter, by measuring the parameter for cells of at least one individual with said ApoE haplotype, measuring the parameter for cells of at least one individual with a different ApoE haplotype, and comparing the measures. Preferably a larger number, e.g., at least 3, 5, 10, 20, 30, 50, 100, or even more, of individuals are utilized, thereby providing additional correlation information. Correlation or other statistical measure of relatedness between haplotype and pharmacologic parameter can be used by one or ordinary skill in the art.
As used herein xe2x80x9cpolymorphismxe2x80x9d refers to DNA sequence variation in the cellular genomes of plants or animals, preferably mammals, and more preferably humans. These sequence variations include mutations, single nucleotide changes and insertions and deletions. xe2x80x9cSingle nucleotide polymorphismxe2x80x9d (SNP) refers to those differences among samples of DNA in which a single nucleotide base pair has been substituted by another.
As used herein xe2x80x9cvariancexe2x80x9d or xe2x80x9cvariantsxe2x80x9d is synonymous with polymorphism, and refers to DNA sequence variations. The terms xe2x80x9cvariant form of a genexe2x80x9d, xe2x80x9cform of a genexe2x80x9d, or xe2x80x9callelexe2x80x9d refer to one specific sequence of a gene that has at least two sequences, the specific forms differing from other forms of the same gene at at least one, and frequently more than one, variant sites within the gene. The sequences at these variant sites that differ between different alleles of the gene are variously termed xe2x80x9callelesxe2x80x9d, xe2x80x9cgene sequence variancesxe2x80x9d, xe2x80x9cvariancesxe2x80x9d or xe2x80x9cvariantsxe2x80x9d. The term xe2x80x9calternative formxe2x80x9d refers to an allele that can be distinguished from other alleles by having distinct variances at least one, and frequently more than one, variant sites within the gene sequence. Other terms known in the art to be equivalent include mutation and polymorphism, although mutation is often used to refer to an allele associated with a deleterious phenotype.
As used herein xe2x80x9cphenotypexe2x80x9d refers to any observable or otherwise measurable physiological, morphological, biological, biochemical or clinical characteristic of an organism. The point of genetic studies is to detect consistent relationships between phenotypes and DNA sequence variation (genotypes). DNA sequence variation will seldom completely account for phenotypic variation, particularly with medical phenotypes of interest (e.g. commonly occuring diseases). Environmental factors are also frequently important.
As used herein, xe2x80x9cgenotypexe2x80x9d refers to the genetic constitution of an organism. More specifically, xe2x80x9cgenotypingxe2x80x9d as used herein refers to the analysis of DNA in a sample obtained from a subject to determine the DNA sequence in a specific region of the genomexe2x80x94e.g. at a gene that influences a disease or drug response. The term xe2x80x9cgenotypingxe2x80x9d may refer to the determination of DNA sequence at one or more polymorphic sites.
As used herein, xe2x80x9chaplotypexe2x80x9d refers to the partial or complete sequence of a segment of DNA from a single chromosome. The DNA segment may include part of a gene, an entire gene, several genes, or a region devoid of genes (but which perhaps contains DNA sequence that regulates the function of nearby genes). The term xe2x80x9chaplotypexe2x80x9d, then, refers to a cis arrangement of two or more polymorphic nucleotides on a particular chromosome, e.g., in a particular gene. The haplotype preserves information about the phase of the polymorphic nucleotidesxe2x80x94that is, which set of variances were inherited from one parent (and are therefore on one chromosome), and which from the other. A genotyping test does not provide information about phase. For example, a subject heterozygous at nucleotide 25 of a gene (both A and C are present) and also at nucleotide 100 of the same gene (both G and T are present) could have haplotypes 25A-100G and 25C-100T, or alternatively 25A-100T and 25C-100G. Only a haplotyping test can discriminate these two cases definitively. Haplotypes are generally inherited as units, except in the event of a recombination during meiosis that occurs within the DNA segment spanned by the haplotypexe2x80x94a rare occurance for any given sequence in each generation. By xe2x80x9chaplotypingxe2x80x9d, or xe2x80x9cdetermining the haplotypexe2x80x9d as used herein is meant determining the sequence of two or more polymorphic sites on a single chromosome. Usually the sample to be haplotyped consists initially of two admixed copies of the chromome segment to be haplotypedxe2x80x94i.e. DNA from a diploid subject.
As used herein xe2x80x9cgenetic testingxe2x80x9d or xe2x80x9cgenetic screeningxe2x80x9d refers to the genotyping or haplotyping analyses performed to determine the alleles present in an individual, a population, or a subset of a population.
xe2x80x9cDisease riskxe2x80x9d as used herein refers to the probability that, for a specific disease (e.g. coronary heart disease) an individual who is free of evident disease at the time of testing will subsequently be affected by the disease.
xe2x80x9cDisease diagnosisxe2x80x9d as used herein refers to ability of a clinician to appropriately determine and identify whether the expressed symtomology, pathology or physiology of a patient is associated with a disease, disorder, or dysfunction.
xe2x80x9cDisease prognosisxe2x80x9d as used herein refers to the forecast of the probable course and or outcome of a disease, disorder, or dysfunction.
xe2x80x9cTherapeutic managementxe2x80x9d as used herein refers to the treatment of disease, disorders, or or dysfunctions by various medical methods. By xe2x80x9cdisease management protocolxe2x80x9d or xe2x80x9ctreatment protocolxe2x80x9d is meant a means for devising a therapeutic plan for a patient using laboratory, clinical and genetic data, including the patient""s diagnosis and genotype. The protocol clarifies therapeutic options and provides information about probable prognoses with different treatments. The treatment protocol may the provide an estimate of the likelihood that a patient will respond positively or negatively to a therapeutic intervention. The treatment protocol may also provide guidance regarding optimal drug dose and administration, and likely timing of recovery or rehabilitation. A xe2x80x9cdisease management protocolxe2x80x9d or xe2x80x9ctreatment protocolxe2x80x9d may also be formulated for asymptomatic and healthy subjects in order to forecast future disease risks based on laboratory, clinical and genetic variables. In this setting the protocol specifies optimal preventive or prophylactic interventions, including use of compounds, changes in diet or behavior, or other measures. The treatment protocol may include the use of a computer program.
The term xe2x80x9cassociated withxe2x80x9d in connection with the relationship between a genetic characteristic, e.g., a gene, allele, haplotype, or polymorphism, and a disease or condition means that there is a statistically significant level of relatedness between them based on any generally accepted statistical measure of relatedness. Those skilled in the art are familiar with selecting an appropriate statistical measure for a particular experimental situation or data set. The genetic characteristic, e.g., the gene or haplotype, may, for example, affect the incidence, prevalence, development, severity, progression, or course of the disease. For example, ApoE or a particular allele(s) or haplotype of the gene is related to a disease if the ApoE gene is involved in the disease or condition as indicated, or if a particular sequence variance, haplotype, or allele is so involved.
As used herein, a xe2x80x9cgenexe2x80x9d is a sequence of DNA present in a cell that directs the expression of a xe2x80x9cbiologically activexe2x80x9d molecule or xe2x80x9cgene productxe2x80x9d, most commonly by transcription to produce RNA and translation to produce protein. Such a gene may also be manipulated by many different molecular biology techniques, and thus, for example, can be isolated or purified or otherwise separated from its natural environment. The xe2x80x9cgene productxe2x80x9d is most commonly a RNA molecule or protein or a RNA or protein that is subsequently modified by reacting with, or combining with, other constituents of the cell. Such modifications may include, without limitation, modification of proteins to form glycoproteins, lipoproteins, and phosphoproteins, or other modifications known in the art. RNA may be modified without limitation by polyadenylation, splicing, capping or export from the nucleus or by covalent or noncovalent interactions with proteins. The term xe2x80x9cgene productxe2x80x9d refers to any product directly resulting from transcription of a gene. In particular this includes partial, precursor, and mature transcription products (i.e., pre-mRNA and mRNA), and translation products with or without further processing including, without limitation, lipidation, phosphorylation, glycosylation, or combinations of such processing.
As used herein the term xe2x80x9chybridizationxe2x80x9d, when used with respect to DNA fragments or polynucleotides encompasses methods including both natural polynucleotides, non-natural polynucleotides or a combination of both. Natural polynucleotides are those that are polymers of the four natural deoxynucleotides (deoxyadenosine triphosphate [dA], deoxycytosine triphosphate [dC], deoxyguanine triphosphate [dG] or deoxythymidine triphosphate [dT], usually designated simply thymidine triphosphate [T]) or polymers of the four natural ribonucleotides (adenosine triphosphate [A], cytosine triphosphate [C], guanine triphosphate [G] or uridine triphosphate [U]). Non-natural polynucleotides are made up in part or entirely of nucleotides that are not natural nucleotides; that is, they have one or more modifications. Also included among non-natural polynucleotides are molecules related to nucleic acids, such as peptide nucleic acid [PNA]). Non-natural polynucleotides may be polymers of non-natural nucleotides, polymers of natural and non-natural nucleotides (in which there is at least one non-natural nucleotide), or otherwise modified polynucleotides. Non-natural polynucleotides may be useful because their hybridization properties differ from those of natural polynucleotides. As used herein the term xe2x80x9ccomplementaryxe2x80x9d, when used in respect to DNA fragments, refers to the base pairing rules established by Watson and Crick: A pairs with T or U; G pairs with C. Complementary DNA fragments have sequences that, when aligned in antiparallel orientation, conform to the Watson-Crick base pairing rules at all positions or at all positions except one. As used herein, complementary DNA fragments may be natural polynucleotides, non-natural polynucleotides, or a mixture of natural and non-natural polynucleotides.
As used herein xe2x80x9camplifyxe2x80x9d when used with respect to DNA refers to a family of methods for increasing the number of copies of a starting DNA fragment. Amplification of DNA is often performed to simplify subsequent determination of DNA sequence, including genotyping or haplotyping. Amplification methods include the polymerase chain reaction (PCR), the ligase chain reaction (LCR) and methods using Q beta replicase, as well as transcription-based amplification systems such as the isothermal amplification procedure known as self-sustained sequence replication (3SR, developed by T. R. Gingeras and colleagues), strand displacement amplification (SDA, developed by G. T. Walker and colleagues) and the rolling circle amplification method (developed by P. Lizardi and D. Ward).
By xe2x80x9ccomprisingxe2x80x9d is meant including, but not limited to, whatever follows the word xe2x80x9ccomprisingxe2x80x9d. Thus, use of the term xe2x80x9ccomprisingxe2x80x9d indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present. By xe2x80x9cconsisting ofxe2x80x9d is meant including, and limited to, whatever follows the phrase xe2x80x9cconsisting ofxe2x80x9d. Thus, the phrase xe2x80x9cconsisting ofxe2x80x9d indicates that the listed elements are required or mandatory, and that no other elements may be present. By xe2x80x9cconsisting essentially ofxe2x80x9d is meant including any elements listed after the phrase, and limited to other elements that do not interfere with or contribute to the activity or action specified in the disclosure for the listed elements. Thus, the phrase xe2x80x9cconsisting essentially ofxe2x80x9d indicates that the listed elements are required or mandatory, but that other elements are optional and may or may not be present depending upon whether or not they affect the activity or action of the listed elements.
Other features and advantages of the invention will be apparent from the following description of the preferred embodiments thereof, and from the claims.
Table 1 The table lists the masses of the normal nucleotides and BrdU and the mass differences between each of the possible pairs of nucleotides.
Table 2 Twenty polymorphic sites in the ApoE gene. The ApoE genomic sequence is taken from GenBank accession AB012576. The gene is composed of four exons and three introns. The transcription start site (beginning of first exon) is at nucleotide (nt) 18,371; of GenBank accession AB012576, while the end of the transcribed region (end of the 3xe2x80x2 untranslated region, less polyA tract) is at nt 21958. The twenty polymorphic sites are depicted as shaded nucleotides in the Table, and are as follows (nucleotide position and possible nucleotides): 16541 (T/G); 16747 (T/G); 16965 (T/C); 17030 (G/C); 17098 (A/G); 17387 (T/C); 17785 (G/A); 17874 (T/A); 17937 (C/T); 18145 (G/T); 18476 (G/C); 19311 (A/G); 20334 (A/G); 21250 (C/T; 21349 (T/C); 21388 (T/C); 23524 (A/G); 23707 (A/C); 23759 (C/T); 23805 (G/C); and 37237 (G/A). The bold sequence listing indicates the transcribed sequence of the ApoE gene; the grey shaded region indicates the ApoE gene enhancer element; the underlined sequence depicts the coding region of the ApoE gene. Where polymorphisms result in a change of the amino acid sequence, the amino acid alteration is indicated, for example at nucleotide position 20334 the A/T polymorphism results in a alanine/threonine repsectively at amino acid position 18 of the ApoE gene product. As described in the Detailed Description below, the polymorphisms at positions GenBank nucleotide number 17874, 17937, 18145, 18476, 21250, and 21388 have been previously described.
Table 3 This table provides experimentally derived ApoE haplotypes. The haplotypes encompass nine polymorphic sites within the ApoE gene (GenBank accession number AB012576). The Table has nine columns with haplotype data at nine specific sites within the ApoE gene. The column listed as xe2x80x9cWWP #xe2x80x9d refers to a Coriell number which refers to the catalogued number of an established human cell line. The xe2x80x9cVGNX_Symbolxe2x80x9d row provides an internal identifier for the gene; the xe2x80x9cVGNX databasexe2x80x9d row identifies the base pair number of the ApoE cDNA; and the xe2x80x9cGenBankxe2x80x9d row identifies the GenBank base pair number of the sequence for the ApoE gene. The abbreviations are as follows: A=adenine nucleotide, C=cytosine nucleotide, G=guanosine nucleotide, and T=thymidine nucleotide. The abbreviated nucleotides in brackets indicate that either nucleotide may be present in the sample. Thus for example, under column GEN-CBX and WWP#1, the genotype identified at the GenBank position 17874 is an xe2x80x9cAxe2x80x9d; whereas under Column GEN-CBX at the GenBank position 18476 the genotype under the WWP#1 is either a xe2x80x9cTxe2x80x9d or a xe2x80x9cGxe2x80x9d.
Table 4 This table provides the sequence of ApoE haplotypes comprising up to 20 polymorphic sites. There are 42 ApoE haplotypes listed in the Table. The top row of the table provides the location of the polyorphic nucleotides in the ApoE gene (see Table 2). The numbers (16541, 16747, and so forth) correspond to the numbering in GenBank accession AB012576xe2x80x941, which provides the sequence of a cosmid clone that contains the entire ApoE gene and flanking DNA. Each column shows the sequence of the ApoE gene at the position indicated at the top of the column. Abbreviations are as follows: A=adenine nucleotide, C=cytosine nucleotide, G=guanosine nucleotide, and T=thymidine nucleotide. Each row provides the sequence of an individual phenotype.
Table 5 This table provides the sequence of haplotypes at the the ApoE gene determined by 5 polymorphic sites. These haplotypes allow classification of ApoE alleles into the e2, e3 and e4 groups without recourse to the polymorphic sites conventionally used to determine e2, e3, e4 status. In this table the haplotypes are specified by SNPs at positions 16747, 17030, 17785, 19311, and 23707, listed as column headings. The GENOTYPE column provides the classic ApoE genotype/phenotype (e2, e3 and e4) corresponding to the haplotype indicated in each row.