The present invention is in the field of osteoporosis diagnosis and therapy. The present invention specifically provides previously unknown single nucleotide polymorphisms (SNPs) in genes that have been identified as being involved in pathologies associated with osteoporosis. Since these genes are known to be associated with osteoporosis, the presently disclosed naturally occurring polymorphisms (variants) are valuable for association and linkage analysis. Specifically, the identified SNPs are useful for such applications as screening for osteoporosis susceptibility, prevention of osteoporosis, development of diagnostics and therapies for osteoporosis, development of drugs for osteoporosis, and development of individualized drug treatments based on an individual""s SNP profile. The SNPs provided by the present invention are also useful for human identification. Methods and reagents for detecting the presence of these polymorphisms are provided.
The human bones are generally subdivided into cortical bone and cancellous bone. The cortical bone is dense osseous tissue and is represented by the diaphysis of appendicular skeleton in the form of pipes. On the other hand, cancellous bone consists mainly of trabeculae. Such cancellous bone is present, for example, in the epiphysial portions of long shaft bones, vertebrae, carpal bones, calcanei, tali, and tarsal bones. However, since cancellous bone has a larger surface in contact with soft tissue containing vasculatures, cancellous bone shows higher metabolic turn over and is predisposed to rapid changes under bone diseases or treatments.
Metabolic bone diseases include those conditions producing diffusely decreased bone density (osteopenia) and diminished bone strength. It is categorized by histologic appearance: osteoporosis which is common and defined by decreased mineral and bone matrix, and osteomalacia which is unusual and defined by decreased mineral but intact bone matrix.
Osteoporosis is the most common metabolic bone disease and the cause of hundreds of thousands of fractures every year. The morbidity and indirect mortality rates are very high. Since the usual form of the disease is clinically evident in middle life and beyond and since women are more frequently affected than men, it may be referred to as xe2x80x9cpostmenopausalxe2x80x9d osteoporosis. It is characterized by a decrease in the amount of bone present to a level below which it is capable of maintaining the structural integrity of the skeleton. The rate of bone formation is often normal, whereas the rate of bone resorption is increased. There is a greater loss of trabecular bone than compact bone, accounting for the primary features of disease, i.e., crush fractures of vertebrae, fractures of the neck of the femur, and fractures of distal end of the radius. Whatever bone is present is normally mineralized.
The cause of osteoporosisare among, but not limited to the hormone deficiency (estrogen or androgen), hormone excess (cushing""s syndrome or glucocorticoid administration, thyrotoxicosis, hyperparathyroidism, excessive vitamin D administration), immobilization, tabacco, malignancy, idiopathic or geriatric, and genetic disorders (Type I collagen mutations, Ehlers-danlos syndrome, Marfan""s syndrome, Homocystinuria).
Among the genetic disorders, osteogenesis imperfecta is caused by a major mutation in the gene encoding for type I collagen, the major collagen constituent of bone. This causes severe osteoporosis. Marfan""s syndrome is caused by mutations in fibrillin gene on chromosome 15. Homocytinuria is caused by cystathionine beta-synthase deficiency and exhibits an autosomal recessive pattern of inheritance.
Researchers believe that genetic factors play a dominant role in the etiology of this disease among the ethnic or gender difference. Several genes have been shown to be associated with low bone density and research has focused on identifying those genes that may act as markers of disease. Common allelic variations of the vitamin D receptor gene have been found to be associated with decreased bone density in certain populations, including premenopausal women and young girls (Wood, R. J. and Fleet, J. C. Ann. Rev. Nutrit. 1998 18:233-258). Bone mineral density has also been associated with genetic variation in the estrogen receptor gene, both by itself and in conjunction with variations in the vitamin D receptor gene (Willing et al. J. Bone Min. Res. 1998 13:695-705). In Japanese women, the HLA-A*24-B*07-DRB*01 halotype has been linked to low peak bone mass (Tsuji et al. Hum. Immunol.1998 59:243-249). A variant of the gene encoding transforming growth factor-beta 1 has also been associated with low bone mass in osteoporotic women and with low bone mass and increased bone turnover in normal women (Langdahl et al. Bone 1997 20:289-294). A polymorphism of the COLIAI gene has been identified as a potential marker for low bone mass and vertebral fracture in women (Grant et al. Nat. Genet. 1996 14:203-205). Devoto et al. (Eur. J. Hum. Genet. 1998 6:151-157) determined that there was a gene or genes on chromosome 1 of humans that was linked to low bone density. Polymorphisms linked to osteoporosis have been described in the TGF-91 gene, whose protein product is abundant in bone and an important regulator of bone resorption and formation (Langdahl et al., 1997; Yamada et al., 1998; W097/28280). A polymorphism in the gene on chromosome 1 for tumor necrosis factor alpha receptor 2 has now been shown to be associated with low bone density. (Spotila et al. WO 0032826).
Genes associated with osteoporosis include, but not limit to: alcitonin receptor, collagen subunit (alpha-1 (X)) 3, Kuestner,et al Mol. Pharmacol. 46 (2), 246-255 (1994); insulin-like growth factor binding protein 1, Brewer et al., Biochem. Biophys. Res. Commun. 152 (3), 1289-1297 (1988), Brinkman et al., EMBO J. 7 (8), 2417-2423 (1988), Cubbage et al., Mol. Endocrinol. 3 (5), 846-851 (1989), Alitalo et al., Hum. Genet. 83 (4), 335-338 (1989), Ekstand et al.,Genomics 6 (3), 413-418 (1990), Suwanichkul et al., J. Biol. Chem. 265 (34), 21185-21193 (1990), Ehrenborg et al., Genomics 12 (3), 497-502 (1992); insulin-like growth factor 1 receptor beta chain, Francke et al., Cold Spring Harb. Symp. Quant. Biol. 51, 855-866 (1986), Ullrich et al., EMBO J. 5 (10), 2503-2512 (1986), Flier et al., Proc. Natl. Acad. Sci. U.S.A. 83 (3), 664-668 (1986), Abbott et al., J. Biol. Chem. 267 (15), 10759-10763 (1992), Werner et al., Proc. Natl. Acad. Sci. U.S.A. 93 (16), 8318-8323 (1996), Grant et al., J. Clin. Endocrinol. Metab. 83 (9), 3252-3257 (1998); interleukin 4 receptor, Idzerda et al., J. Exp. Med. 171 (3), 861-873 (1990), Pritchard et al., Genomics 10 (3), 801-806 (1991); Werner syndrome, Goto et al., Nature 355 (6362), 735-738 (1992).
Diagnosis of osteoporosis is most often done in conjunction with a study of bone density by radiography. Although clinical laboratory tests such as levels of calcium and phosphorus in blood can be examined, these measures are usually normal in osteoporotic patients. Only about 20% of postmenopausal women with osteoporosis exhibit hypercalciuria, or increased excretion of calcium in urine.
Therefore, such laboratory findings are not indicative of the presence of disease, and clearly would not be indicative of risk of developing disease. To date, the prediction of risk of developing disease relies on family history of the disease. However, no genetic test is currently available to screen individuals.
The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor sequences (Gusella, Ann. Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral. In some instances, a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism. In other instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. Additionally, the effect of a variant form may be both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. In many instances, both progenitor and variant form(s) survive and co-exist in a species population. The coexistence of multiple forms of a sequence gives rise to polymorphisms, such as SNPs.
The reference allelic form is arbitrarily designated and may be, for example, the most abundant form in a population, or the first allelic form to be identified, and other allelic forms are designated as alternative, variant or polymorphic alleles. The allelic form occurring most frequently in a selected population is sometimes referred to as the xe2x80x9cwild typexe2x80x9d form.
Approximately 90% of all polymorphisms in the human genome are single nucleotide polymorphisms (SNPs). SNPs are single base pair positions in DNA at which different alleles, or alternative nucleotides, exist in some population. The SNP position, or SNP site, is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than {fraction (1/100)} or {fraction (1/1000)} members of the populations). An individual may be homozygous or heterozygous for an allele at each SNP position. As defined by the present invention, the least frequent allele at a SNP position can have any frequency that is less than the frequency of the more frequent allele, including a frequency of less than 1% in a population. A SNP can, in some instances, be referred to as a xe2x80x9ccSNPxe2x80x9d to denote that the nucleotide sequence containing the SNP is an amino acid coding sequence.
A SNP may arise due to a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or vice versa. A SNP may also be a single base insertion/deletion variant (referred to as xe2x80x9cindelsxe2x80x9d). A substitution that changes a codon coding for one amino acid to a codon coding for a different amino acid is referred to as a non-synonymous codon change, or missense mutation. A synonymous codon change, or silent mutation, is one that does not result in a change of amino acid due to the degeneracy of the genetic code. A nonsense mutation is a type of non-synonymous codon change that results in the formation of a stop codon, thereby leading to premature termination of a polypeptide chain and a defective protein.
SNPs, in principle, can be bi-, tri-, or tetra- allelic. However, tri- and tetra-allelic polymorphisms are extremely rare, almost to the point of non-existence (Brookes, Gene 234 (1999) 177-186). For this reason, SNPs are often referred to as xe2x80x9cbi-allelic markersxe2x80x9d, or xe2x80x9cdi-allelic markersxe2x80x9d.
Causative SNPs are those SNPs that produce alterations in gene expression or in the expression or function of a gene product, and therefore are most predictive of a possible clinical phenotype. One such class includes SNPs falling within regions of genes encoding a polypeptide product, i.e. cSNPs. These SNPs may result in an alteration of the amino acid sequence of the polypeptide product (i.e., non-synonymous codon changes) and give rise to the expression of a defective or other variant protein. Furthermore, in the case of nonsense mutations, a SNP may lead to premature termination of a polypeptide product. Such variant products can result in a pathological condition, e.g., genetic disease. Examples of genes in which a polymorphism within a coding sequence gives rise to genetic disease include sickle cell anemia and cystic fibrosis. Causative SNPs do not necessarily have to occur in coding regions; causative SNPs can occur in any region that can ultimately affect the expression and/or activity of the protein encoded by the nucleic acid. Such gene areas include those involved in transcription, such as SNPs in promoter regions, in gene areas involved in transcript processing, such as SNPs at intron-exon boundaries that may cause defective splicing, or SNPs in mRNA processing signal sequences such as polyadenylation signal regions. For example, a SNP may inhibit splicing of an intron and result in mRNA containing a premature stop codon, leading to a defective protein. Consequently, SNPs in regulatory regions can have substantial phenotypic impact.
Some SNPs that are not causative SNPs nevertheless are in close association with, and therefore segregate with, a disease-causing sequence. In this situation, the presence of the SNP correlates with the presence of, or susceptibility to, the disease. These SNPs are invaluable for diagnostics and disease susceptibility screening.
Clinical trials have shown that patient response to treatment with pharmaceuticals is often heterogeneous. Thus there is a need for improved approaches to pharmaceutical agent design and therapy. SNPs can be used to help identify patients most suited to therapy with particular pharmaceutical agents (this is often termed xe2x80x9cpharmacogenomicsxe2x80x9d). Pharnacogenomics can also be used in pharmaceutical research to assist the drug selection process. (Linder et al. (1997), Clinical Chemistry, 43, 254; Marshall (1997), Nature Biotechnology, 15, 1249; International Patent Application WO 97/40462, Spectra Biomedical; and Schafer et al. (1998), Nature Biotechnology, 16, 3.).
Population Genetics is the study of how Mendel""s laws and other genetic principles apply to entire populations. Such a study is essential to a proper understanding of the genetic basis of osteoporosis and SNP-based association studies and linkage disequilibrium mapping. Population genetics thus seeks to understand and to predict the effects of such genetic phenomena as segregation, recombination, and mutation; at the same time, population genetics must take into account such ecological and evolutionary factors as population size, patterns of mating, geographic distribution of individuals, migration and natural selection.
Linkage is the coinheritance of two or more nonallelic genes because their loci are in close proximity on the same chromosome, such that after meiosis they remain associated more often than the 50% expected for unlinked genes. During meiosis, there is a physical crossing over, it is clear that during the production of germ cells there is a physical exchange of maternal and paternal genetic contributions between individual chromatids. This exchange necessarily separates genes in chromosomal regions that were contiguous in each parent and, by mixing them with retained linear order, results in xe2x80x9crecombinantsxe2x80x9d. The process of forming recombinants through meiotic crossing-over is an essential feature in the reassortment of genetic traits and is central to understanding the transmission of genes.
Recombination generally occurs between large segments of DNA. This means that contiguous stretches of DNA and genes are likely to be moved together. Conversely, regions of the DNA that are far apart on a given chromosome are more likely to become separated during the process of crossing-over than regions of the DNA that are close together.
It is possible to use polymorphic molecular markers, such as SNPs, to clarify the recombination events that take place during meiosis. They are used as position markers and regional identifying characters along chromosomes and can also be used to distinguish paternally derived gene regions from maternally derived gene regions.
The pattern of a set of markers along a chromosome is referred to as a xe2x80x9cHaplotypexe2x80x9d. Therefore sets of alleles on the same small chromosomal segment tend to be transmitted as a block through a pedigree. By analyzing the haplotypes in a series of offspring of parents whose haplotypes are known, it is possible to establish which parental segment of which chromosome was transmitted to which child. When not broken up by recombination, haplotypes can be treated for mapping purposes as alleles at a single highly polymorphic locus.
The existence of a preferential occurrence of a disease gene in association with specific alleles of linked markers, such as SNPs, is called xe2x80x9cLinkage Disequilibriumxe2x80x9d(LD). This sort of disequilibrium generally implies that most of the disease chromosomes carry the same mutation and the markers being tested are quite close to the disease gene. For example, there is considerable linkage disequilibrium across the entire HLA locus. The A3 allele is in LD with the B7 and B14 alleles, and as a result B7 and B14 are also highly associated with hemochromatosis. Thus, HLA typing alone can significantly alter the estimate of risk for hemochromatosis, even if other family members are not available for formal linkage analysis. Consequently, by using a combination of several markers surrounding the presumptive location of the gene, a haplotype can be determined for affected and unaffected family members.
SNPs are useful in association studies for identifying particular SNPs, or other polymorphisms, associated with pathological conditions, such as osteoporosis. Association studies may be conducted within the general population and are not limited to studies performed on related individuals in affected families (linkage studies). An association study using SNPs involves determining the frequency of the SNP allele in many patients with the disorder of interest, such as osteoporosis, as well as controls of similar age and race. The appropriate selection of patients and controls is critical to the success of SNP association studies. Therefore, a pool of individuals with well-characterized phenotypes is extremely desirable. For example, blood pressure and heart rate can be correlated with SNP patterns in hypertensive individuals in whom these physiological parameters are known in order to find associations between particular SNP genotypes and known phenotypes. Significant associations between particular SNPs or SNP haplotypes and phenotypic characteristics can be determined by standard statistical methods. Association analysis can either be direct or LD based. In direct association analysis, causative SNPs are tested that are candidates for the pathogenic sequence itself.
In LD based SNP association analysis, random SNPs are tested over a large genomic region, possibly the entire genome, in order to find a SNP in LD with the true pathogenic sequence or pathogenic SNP. For this approach, high density SNP maps are required in order for random SNPs to be located close enough to an unknown pathogenic locus to be in linkage disequilibrium with that locus in order to detect an association. SNPs tend to occur with great frequency and are spaced uniformly throughout the genome. The frequency and uniformity of SNPs means that there is a greater probability, compared with other types of polymorphisms such as tandem repeat polymorphisms, that a SNP will be found in close proximity to a genetic locus of interest. SNPs are also mutationally more stable than tandem repeat polymorphisms, such as VNTRs. LD-based association studies are capable of finding a disease susceptibility gene without any a priori assumptions about what or where the gene is.
Currently, however, it is not feasible to do SNP association studies over the entire human genome, therefore candidate genes associated with osteoporosis are targeted for SNP identification and association analysis. The candidate gene approach uses a priori knowledge of disease pathogenesis to identify genes that are hypothesized to directly influence development of the disease. The candidate gene approach may focus on a gene that is directly targeted by a drug used to treat the disorder. To discover SNPs associated with an increased susceptibility to osteoporosis, candidate genes can be selected from systems physiologically implicated in the disease pathway. SNPs found in these genes are then tested for statistical association with disease in individuals who have the disease compared with appropriate controls. The candidate gene approach has the advantages of drastically reducing the number of candidate SNPs, and the number of individuals, that need to be typed, compared with LD-based association studies of random SNPs over large areas of, or complete, genomes. Furthermore, in the candidate gene approach, no assumptions are made about the extent of LD over any particular area of the genome.
Combined with the use of a high density map of appropriately spaced, sufficiently informative SNP markers, association studies, including linkage disequilibrium-based genome wide association studies, will enable the identification of most genes involved in complex disorders, such as osteoporosis. This will enhance the selection of candidate genes most likely to contain causative SNPs associated with a particular disease. All of the SNPs disclosed by the present invention can be employed as part of genome-wide association studies or as part of candidate gene association studies.
The present invention advances the state of the art and provides commercially useful embodiments by providing previously unidentified SNPs in genes known in the art to be associated with osteoporosis.
The present invention is based on the identification of novel SNPs and previously unknown haplotypes in genes known in the art to be associated with the pathologies of osteoporosis. Such polymorphisms/haplotypes can lead to a variety of pathologies and disorders associated with osteoporosis that are mediated/modulated by a variant gene/protein. Further, such polymorphisms are important reagents in studying the pathologies of osteoporosis.
Based on these identifications, the present invention provides methods of detecting these novel variants as well as reagents needed to accomplish this task. The invention specifically provides novel SNPs in genes known to be involved in osteoporosis, variant proteins encoded by the novel SNP forms of these genes, antibodies to the variant proteins, computer-based and data storage systems containing the novel SNP information, methods of detecting these SNPs in a sample, methods of determining a risk of having or developing a disorder mediated by a variant gene/protein, methods of screening for compounds used to treat disorders mediated by a variant gene/protein, methods of treating disorders mediated by a variant gene/protein, and methods of using the novel SNPs of the present invention for human identification. The present invention also provides genomic nucleotide sequences, transcript sequences, encoded amino acid sequences, and context sequences that contain the SNPs of the present invention.
NOTE: Two duplicate copies of the CD-R are submitted herewith, labeled xe2x80x9cCopy 1xe2x80x9d and xe2x80x9cCopy 2xe2x80x9d. The material on each of the duplicate CD-R""s is identical. Thus, descriptions or references herein to the CD-R labeled CL00789CDR and the files contained thereon apply to both xe2x80x9cCopy 1xe2x80x9d and xe2x80x9cCopy 2xe2x80x9d.
The CD-R labeled CL00789CDR contains the following two files:
1) File SEQLISTxe2x80x94789.txt provides the Sequence Listing of the transcript sequences (SEQ ID NOS:1-13), protein sequences (SEQ ID NOS:14-26), and genomic sequences (SEQ ID NOS:27-39) for each osteoporosis-associated gene that contains a SNP of the present invention. Also provided are the context sequences (SEQ ID NOS:40-848) that flank each SNP of the present invention. The context sequences generally provide about 300 bp upstream (5xe2x80x2) and 300 bp downstream (3xe2x80x2) of each SNP, with the SNP about in the middle of the sequence, for a total of about 600 bp of context sequence surrounding each SNP. The transcript, protein, genomic, and context sequences provided in the Sequence Listing are also provided in Table 1, where they are identified by their SEQ ID NO. The Sequence Listing is provided in text (ASCII) format. SEQLISTxe2x80x94789.txt is 2,308 KB in size and was created on Sep. 10, 2001.
2) File TABLE1xe2x80x94789.txt provides Table 1 in text (ASCII) format, which discloses the SNP and associated gene information of the present invention as indicated below in the xe2x80x9cDetailed Description of Table 1xe2x80x9d, including the context sequences (SEQ ID NOS:40-848) flanking each SNP and the transcript (SEQ ID NOS:1-13), protein (SEQ ID NOS:14-26), and genomic sequences (SEQ ID NOS:27-39) of the osteoporosis-associated genes that contain each SNP. File TABLE1xe2x80x94789.txt is 1,910 KB in size and was created on Sep. 10, 2001.
The material contained on the CD-R labeled CL00789CDR is hereby incorporated by reference pursuant to 37 CFR 1.77(b)(4).
Table 1 discloses the SNP and associated gene information of the present invention. For each SNP, Table 1 provides gene information followed by SNP information.
The gene information includes: a gene number, a Celera hCT number and/or a RefSeq NM number (the NM number is a reference number to an annotated human gene that is publicly known and whose role in disease processes is understood to the point of providing commercial uses for the naturally occurring variants herein described; the public gene identified by the NM number may be the same as the gene identified by the hCT number, or may be a homolog, or paralog thereof), the art-known gene name, the art-known protein name, Celera genomic axis position and chromosomal position/cytoband of the gene where available, a public reference (e.g., OMIM reference information, which can readily be used by one of ordinary skill in the art to associate the allelic variants of each gene provided herein with medically significant disease conditions and pathologies, thus providing readily apparent commercial utilities for the SNP information of the present invention) to the gene/protein name supporting the medical significance of the gene/protein, transcript sequence (corresponding to SEQ ID NOS:1-13 of the Sequence Listing), protein sequence (corresponding to SEQ ID NOS:14-26 of the Sequence Listing), and genomic sequence (corresponding to SEQ ID NOS:27-39 of the Sequence Listing) of the assembled genomic region containing the gene. The SEQ ID NOS provided for each sequence in Table 1 correspond with the SEQ ID NOS in the Sequence Listing, provided in file SEQLISTxe2x80x94789.TXT on the accompanying CD-R, label CL00789CDR. NOTE: the genomic sequences always correspond to Celera genomic sequences; where both a Celera hCT number and an NM number are provided for a gene, the transcript and protein sequences correspond to the Celera sequences identified by the hCT number, where only an NM number is provided for a gene, the transcript and protein sequences correspond to the public sequences identified for the NM number.
The SNP information includes: 300 bp of 5xe2x80x2 and 3xe2x80x2 context sequence (corresponding to SEQ ID NOS:40-848 of the Sequence Listing; in some instances, the context sequences may be reverse complemented relative to the gene/transcript sequences), Celera CV identification number for internal tracking, identified alleles, populations seen with alleles (xe2x80x9ccauxe2x80x9d=Caucasian, xe2x80x9chisxe2x80x9d=Hispanic, xe2x80x9cchnxe2x80x9d=Chinese, and xe2x80x9cafrxe2x80x9d=African, xe2x80x9cjpnxe2x80x9d=Japanese, xe2x80x9cindxe2x80x9d=Indian, xe2x80x9cmex xe2x80x9d=Mexican, xe2x80x9cainxe2x80x9d=xe2x80x9cAmerican Indian, xe2x80x9ccraxe2x80x9d=Celera donor, xe2x80x9cno_popxe2x80x9d=no population information available), SNP type [xe2x80x9cMIS-SENSE MUTATIONxe2x80x9d=SNP causes a change in the encoded amino acid (i.e., a non-synonymous coding SNP); xe2x80x9cINTERGENIC/UNKNOWNxe2x80x9d=SNP occurs in an intergenic region of the genome; xe2x80x9cUNKNOWNxe2x80x9d=SNP is located in an uncharacterized genomic region; xe2x80x9cSILENT MUTATIONxe2x80x9d=SNP does not cause a change in the encoded amino acid (i.e., a synonymous coding SNP); xe2x80x9cSTOP CODON MUTATIONxe2x80x9d=SNP is located in a stop codon; xe2x80x9cNONSENSE MUTATIONxe2x80x9d=SNP creates a stop codon; xe2x80x9cINTRONxe2x80x9d=SNP is located in an intron, xe2x80x9cUTR 5xe2x80x9d=SNP is located in a 5xe2x80x2 UTR of a transcript; xe2x80x9cUTR 3xe2x80x9d=SNP is located in a 3xe2x80x2 UTR of a transcript; xe2x80x9cPUTATIVE UTR 5xe2x80x9d=SNP is located in a putative 5xe2x80x2 UTR; xe2x80x9cPUTATIVE UTR 3xe2x80x9d=SNP is located in a putative 3xe2x80x2 UTR; xe2x80x9cDONOR SPLICE SITExe2x80x9d=SNP is located in a donor splice site (5xe2x80x2 intron boundary); xe2x80x9cACCEPTOR SPLICE SITExe2x80x9d=SNP is located in an acceptor splice site (3xe2x80x2 intron boundary); xe2x80x9cREPEATSxe2x80x9d=SNP is located in a repeat element; CODING REGION=generally, the SNP is an insertion/deletion (xe2x80x9cindelxe2x80x9d) polymorphism that may cause a frameshift that alters the encoded protein downstream of the SNP; EXON=SNP is located in an exon; xe2x80x9cHUMAN-MOUSE CONSERVED REGIONxe2x80x9d=SNP is located in a region of the human genome that shares a high degree of sequence similarity with the mouse; xe2x80x9cCONSERVED SEGMENT PUTATIVExe2x80x9d=generally, SNP is located in a segment of the genome that is a putative regulatory region conserved between human and mouse; xe2x80x9cCORE PROMOTER PREDICTION PUTATIVExe2x80x9d=SNP is located in a predicted core promoter; xe2x80x9cTRANSCRIPTION FACTOR SITE PUTATIVExe2x80x9d=SNP is located in a predicted transcription factor binding site; xe2x80x9cREGULATORY REGIONxe2x80x9d=SNP is located in a regulatory region; and xe2x80x9cPUTATIVE REGULATORY REGIONxe2x80x9d=SNP is located in a putative regulatory region], affected protein (including Celera hCP or Genbank GI number, position of the amino acid residue within the protein identified by the hCP or GI number that is encoded by the codon containing the SNP, and alternative amino acids represented by 1-letter amino acid codes that are encoded by the alternative SNP alleles), and source [whether the SNP is found only in Celera data and is novel to the present invention (xe2x80x9cCeleraxe2x80x9d), or at least one SNP allele has been found in a public database as well as in Celera data but the map position of the SNP may not be publicly known (xe2x80x9cCelera+xe2x80x9d)].