The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor nucleic acid sequences (Gusella, Ann. Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form, or may be neutral. In some instances, a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism. In other instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. In many instances, both progenitor and variant form(s) survive and co-exist in a species population. The coexistence of multiple forms of a sequence gives rise to polymorphisms.
Several different types of polymorphism have been reported. A restriction fragment length polymorphism (RFLP) is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al., Am. J. Hum. Genet. 32, 314-331 (1980)). The restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment. RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; WO90/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al., Genetics 121, 85-99 (1989)). When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
Other polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- and tetra-nucleotide repeated motifs. These tandem repeats are also referred to as variable number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity and paternity analysis (U.S. Pat. No. 5,075,217; Armour et al., FEBS Lett. 307, 113-115 (1992); Horn et al., WO 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies.
Other polymorphisms take the form of single nucleotide variations between individuals of the same species. Such polymorphisms are far more frequent than RFLPs, STRs and VNTRs. Some single nucleotide polymorphisms (SNP) occur in protein-coding nucleic acid sequences (coding sequence SNP (cSNP)), in which case, one of the polymorphic forms may give rise to the expression of a defective or otherwise variant protein and, potentially, a genetic disease. Examples of genes in which polymorphisms within coding sequences give rise to genetic disease include xcex2-globin (sickle cell anemia), apoE4 (Alzheimer""s Disease), Factor V Leiden (thrombosis), and CFTR (cystic fibrosis). cSNPs can alter the codon sequence of the gene and therefore specify an alternative amino acid. Such changes are called xe2x80x9cmissensexe2x80x9d when another amino acid is substituted, and xe2x80x9cnonsensexe2x80x9d when the alternative codon specifies a stop signal in protein translation. When the cSNP does not alter the amino acid specified the cSNP is called xe2x80x9csilentxe2x80x9d.
Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective protein expression (e.g., as a result of defective splicing). Other single nucleotide polymorphisms have no phenotypic effects.
Single nucleotide polymorphisms can be used in the same manner as RFLPs and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are often easier to distinguish than other types of polymorphism (e.g., by use of assays employing allele-specific hybridization probes or primers).
Only a small percentage of the total repository of polymorphisms in humans and other organisms has been identified. The limited number of polymorphisms identified to date is due to the large amount of work required for their detection by conventional methods. For example, a conventional approach to identifying polymorphisms might be to sequence the same stretch of DNA in a population of individuals by dideoxy sequencing. In this type of approach, the amount of work increases in proportion to both the length of sequence and the number of individuals in a population and becomes impractical for large stretches of DNA or large numbers of persons.
Work described herein pertains to the identification of polymorphisms which can predispose individuals to disease, by resequencing large numbers of genes in a large number of individuals. Various genes from a number of individuals have been resequenced as described herein, and SNPs in these genes have been discovered (see the Table and FIG. 3). Some of these SNPs are cSNPs which specify a different amino acid sequence, some of the SNPs are silent cSNPs and some of these cSNPs specify a stop signal in protein translation. Some of the identified SNPs were located in non-coding regions.
The invention relates to a gene which comprises a single nucleotide polymorphism at a specific location. In a particular embodiment the invention relates to the variant allele of a gene having a single nucleotide polymorphism, which variant allele differs from a reference allele by one nucleotide at the site(s) identified in the Table and FIG. 3. Complements of these nucleic acid sequences are also included. The nucleic acid molecules can be DNA or RNA, and can be double- or single-stranded. Nucleic acid molecules can be, for example, 5-10, 5-15, 10-20, 5-25, 10-30, 10-50 or 10-100 bases long.
The invention further provides allele-specific oligonucleotides that hybridize to the reference or variant allele of a gene comprising a single nucleotide polymorphism or to the complement thereof. These oligonucleotides can be probes or primers.
The invention further provides a method of analyzing a nucleic acid from an individual. The method determines which base is present at any one of the polymorphic sites shown in the Table and/or FIG. 3. Optionally, a set of bases occupying a set of the polymorphic sites shown in the Table and/or FIG. 3 is determined. This type of analysis can be performed on a number of individuals, who are tested for the presence of a disease phenotype. The presence or absence of disease phenotype is then correlated with a base or set of bases present at the polymorphic site or sites in the individuals tested.
Thus, the invention further relates to a method of predicting the presence, absence, likelihood of the presence or absence, or severity of a particular phenotype or disorder associated with a particular genotype. The method comprises obtaining a nucleic acid sample from an individual and determining the identity of one or more bases (nucleotides) at polymorphic sites of genes described herein, wherein the presence of a particular base is correlated with a specified phenotype or disorder, thereby predicting the presence, absence, likelihood of the presence or absence, or severity of the phenotype or disorder in the individual.
The thrombospondins are a family of extracellular matrix (ECM) glycoproteins that modulate many cell behaviors including adhesion, migration, and proliferation. Thrombospondins (also known as thrombin sensitive proteins or TSPs) are large molecular weight glycoproteins composed of three identical disulfide-linked polypeptide chains. The results described herein also reveal an important association between alterations, particularly SNPs, in TSP genes, particularly TSP-1 and TSP-4, and vascular disease. In particular, SNPs in these genes which are associated with premature coronary artery disease (CAD)(or coronary heart disease) and myocardial infarction (MI) have been identified and represent a potentially vital marker of upstream biology influencing the complex process of atherosclerotic plaque generation and vulnerability.
Thus, the invention relates to the TSP gene SNPs identified as described herein, both singly and in combination, as well as to the use of these SNPs, and others in TSP genes, particularly those nearby in linkage disequilibrium with these SNPs, for diagnosis, prediction of clinical course and treatment response for vascular disease, development of new treatments for vascular disease based upon comparison of the variant and normal versions of the gene or gene product, and development of cell-culture based and animal models for research and treatment of vascular disease. The invention further relates to novel compounds and pharmaceutical compositions for use in the diagnosis and treatment of such disorders. In preferred embodiments, the vascular disease is CAD or MI.
The invention relates to isolated nucleic acid molecules comprising all or a portion of the variant allele of TSP-1 (e.g., as exemplified by SEQ ID NO: 1), and to isolated nucleic acid molecules comprising all or a portion of the variant allele of TSP-4 (e.g., as exemplified by SEQ ID NO: 3). Preferred portions are at least 10 contiguous nucleotides and comprise the polymorphic site, e.g., a portion of SEQ ID NO: 1 which is at least 10 contiguous nucleotides and comprises the xe2x80x9cGxe2x80x9d at position 2210, or a portion of SEQ ID NO: 3 which is at least 10 contiguous nucleotides and comprises the xe2x80x9cCxe2x80x9d at position 1186. The invention further relates to isolated gene products, e.g., polypeptides or proteins, which are encoded by a nucleic acid molecule comprising all or a portion of the variant allele of TSP-1 or TSP-4 (e.g., SEQ ID NO: 1 or SEQ ID NO: 3, respectively). The invention also relates to nucleic acid molecules which hybridize to and/or share identity with the variant alleles identified herein (or their complements) and which also comprise the variant nucleotide at the SNP site.
The invention further relates to isolated proteins or polypeptides comprising all or a portion of the variant amino acid-sequence of TSP-1 (e.g., as exemplified by SEQ ID NO: 2), and to isolated proteins or polypeptides comprising all or a portion of the variant amino acid sequence of TSP-4 (e.g., as exemplified by SEQ ID NO: 4). Preferred polypeptides are at least 10 contiguous amino acids and comprise the polymorphic amino acid, e.g., a portion of SEQ ID NO: 2 which is at least 10 contiguous amino acids and comprises the serine at residue 700, or a portion of SEQ ID NO: 4 which is at least 10 contiguous amino acids and comprises the proline at residue 387. The invention further relates to isolated nucleic acid molecules encoding such proteins and polypeptides, as well as to antibodies which bind, e.g., specifically, to such proteins and polypeptides.
The invention further relates to a method of diagnosing or aiding in the diagnosis of a disorder associated with the presence of one or more of (a) a G at nucleotide position 2210 of SEQ ID NO: 1; or (b) a C at nucleotide position 1186 of SEQ ID NO: 3 in an individual. The method comprises obtaining a nucleic acid sample from the individual and determining the nucleotide present at one or more of the indicated nucleotide positions, wherein presence of one or more of (a) a G at nucleotide position 2210 of SEQ ID NO: 1; or (b) a C at nucleotide position 1186 of SEQ ID NO: 3 is indicative of increased likelihood of said disorder in the individual as compared with an appropriate control, e.g., an individual having the reference nucleotide at one or more of said positions. In a particular embodiment the disorder is a vascular disease selected from the group consisting of atherosclerosis, coronary heart or artery disease, MI, stroke, peripheral vascular diseases, venous thromboembolism and pulmonary embolism. In a preferred embodiment, the vascular disease is selected from the group consisting of CAD and M.
The invention further relates to a method of diagnosing or aiding in the diagnosis of a disorder associated with one or more of (a) a G at nucleotide position 2210 of SEQ ID NO: 1; or (b) a C at nucleotide position 1186 of SEQ ID NO: 3 in an individual. The method comprises obtaining a nucleic acid sample from the individual and determining the nucleotide present at one or more of the indicated nucleotide positions, wherein presence of one or more of (a) an A at nucleotide position 2210 of SEQ ID NO: 1; or (b) a G at nucleotide position 1186 of SEQ ID NO: 3 is indicative of decreased likelihood of said disorder in the individual as compared with an appropriate control, e.g., an individual having the variant nucleotide at said position. In a particular embodiment the disorder is a vascular disease selected from the group consisting of atherosclerosis, coronary heart or artery disease, MI, stroke, peripheral vascular diseases, venous thromboembolism and pulmonary embolism. In a preferred embodiment, the vascular disease is selected from the group consisting of CAD and MI.
In one embodiment, the invention relates to a method for predicting the likelihood that an individual will have a vascular disease (or aiding in the diagnosis of a vascular disease), comprising the steps of obtaining a DNA sample from an individual to be assessed and determining the nucleotide present at one or more of nucleotide positions 2210 of SEQ ID NO: 1 or 1186 of SEQ ID NO: 3. The presence of the reference nucleotide at one or more of these positions indicates that the individual has a lower likelihood of having a vascular disease than an individual having the variant nucleotide at one or more of these positions, or a lower likelihood of having severe symptomology. In a particular embodiment, the individual is an individual at risk for development of a vascular disease.
The invention further relates to a method of diagnosing or aiding in the diagnosis of a disorder associated with the presence of one or more of (a) a serine at amino acid position 700 of SEQ ID NO: 2; or (b) a proline at amino acid position 387 of SEQ ID NO: 4 in an individual. The method comprises obtaining a biological sample containing the TSP-1 and/or TSP-4 protein or relevant portion thereof from the individual and determining the amino acid present at one or more of the indicated amino acid positions, wherein presence of one or more of (a) a serine at amino acid position 700 of SEQ ID NO: 2; or (b) a proline at amino acid position 387 of SEQ ID NO: 4 is indicative of increased likelihood of said disorder in the individual as compared with an appropriate control, e.g., an individual having the reference amino acid at one or more of said positions.
The invention further relates to a method of diagnosing or aiding in the diagnosis of a disorder associated with one or more of (a) a serine at amino acid position 700 of SEQ ID NO: 2; or (b) a proline at amino acid position 387 of SEQ ID NO: 4 in an individual. The method comprises obtaining a biological sample containing the TSP-1 and/or TSP-4 protein or relevant portion thereof from the individual and determining the amino acid present at one or more of the indicated amino acid positions, wherein presence of one or more of (a) an asparagine at amino acid position 700 of SEQ ID NO: 2; or (b) an alanine at amino acid position 387 of SEQ ID NO: 4 is indicative of decreased likelihood of said disorder in the individual as compared with an appropriate control, e.g., an individual having the variant amino acid at one or more of said positions.
In one embodiment, the invention relates to a method for predicting the likelihood that an individual will have a vascular disease (or aiding in the diagnosis of a vascular disease), comprising the steps of obtaining a biological sample comprising the TSP-1 and/or TSP-4 protein or relevant portion thereof from an individual to be assessed and determining the amino acid present at one or more of amino acid positions 700 of SEQ ID NO: 2 or 387 of SEQ ID NO: 4. The presence of the reference amino acid at one or more of these positions indicates that the individual has a lower likelihood of having a vascular disease than an individual having the variant amino acid at one or more of these positions, or a lower likelihood of having severe symptomology. In a particular embodiment, the individual is an individual at risk for development of a vascular disease.
In another embodiment, the invention relates to pharmaceutical compositions comprising a reference TSP-1 and/or TSP-4 gene or gene product, or active portion thereof, for use in the treatment of vascular diseases. The invention further relates to the use of agonists and antagonists of TSP-1 and TSP-4 activity for use in the treatment of vascular diseases. In a particular embodiment the vascular disease is selected from the group consisting of atherosclerosis, coronary heart or artery disease, MI, stroke, peripheral vascular diseases, venous thromboembolism and pulmonary embolism. In a preferred embodiment, the vascular disease is selected from the group consisting of CAD and MI.