Autoimmune Diseases & Rheumatoid Arthritis
Autoimmune diseases are a major health issue, occurring in up to 3% of the general population (Cooper & Stroehla, 2003, Autoimmunity Rev. 2:119-125). Although the clinical phenotypes of these diseases are distinct, they share certain common elements, including geographical distributions, population frequencies, therapeutic strategies, and some clinical features which suggest potential similarities in the underlying mechanisms of these diseases. Furthermore, the co-association of multiple autoimmune diseases in the same individual or family supports the presence of common environmental and genetic factors that predispose an individual to autoimmunity (Vyse & Todd, 1996, Cell 85:311-318; Cooper & Stroehla, 2003, Autoimmunity Rev. 2:119-125; Ueda et al., 2003, Nature 423:506-511).
While the major histocompatability complex (MHC) is a known susceptibility locus for many autoimmune diseases, a recent comparison of 23 autoimmune or inflammatory genome-wide scans or linkage studies have uncovered a clustering of non-MHC susceptibility candidate loci (Becker et al., 1998, Proc. Natl. Acad. Sci. USA 95:9979-9984; Jawaheer et al., 2001, Am. J. Human Genet. 68:927-936). Examples of autoimmune diseases are rheumatoid arthritis, type 1 diabetes, multiple sclerosis, systemic lupus erythematosus, inflammatory bowel diseases, psoriasis, thyroiditis, celiac disease, pernicious anemia, asthma, vitiligo, glomerulonephritis, Graves' disease, myocarditis, Sjogren disease, and primary systemic vasculitis.
Rheumatoid arthritis (RA) is one of the most common autoimmune inflammatory disorders, with a prevalence of between 0.5-1% in most adult populations. It is found worldwide and affects all ethnic groups, although it is more common in Europe and the United States than in Asia (Abdel-Nasser et al., 1997, Semin. Arthritis Rheum. 27:123-140; Silman & Hochberg, 1993, Rheumatoid Arthritis, Epidemiology of the Rheumatic Diseases, Oxford University Press, pp. 7-68) and there is a gradient in Europe with a higher prevalence in the north (Cimmino et al., 1998, Ann. Rheum. Diseases 57:315-318). RA can also occur in any age group. Onset is typically between the ages of 40 and 60 years, and the incidence increases with age until approximately 70-80 years, at which point it declines (Abdel-Nasser et al., 1997, Semin. Arthritis Rheum. 27: 123-140; Silman & Hochberg, 1993, Epidemiology of the Rheumatic Diseases. Oxford University Press. pp. 7-68). RA is two to three times more common in women than men, depending on age (Linos et al., 1980, J. Chronic Diseases 33:73-77). The observations that (i) women in the postpartum period are at increased risk for RA onset and (ii) women with RA commonly experience remission during pregnancy followed by postpartum relapse (Barrett et al., 1999, Arhritis Rheum. 42:1219-1227) suggest that hormones play a role in disease onset.
RA is a chronic, progressive disease of unknown etiology characterized by the infiltration of activated lymphocytes and macrophages into the synovial lining of the affected joint. These cells produce cytokines and degradative enzymes, which mediate inflammation and destruction of the joint architecture, often leading to permanent disability. RA is a systemic disease; extra-articular manifestations are often present and can range from relatively minor problems, such as rheumatoid nodules, to life-threatening organ disease.
Clinically, RA is a highly heterogeneous disease varying from very mild to severely disabling disease with upwards of one in 20 patients progressing to severe, erosive disease. Joint damage occurs early in disease with the greatest progression to joint abnormalities taking place during the first six years. Within three years of disease onset, as many as 70% of patients show some radiographic evidence of joint damage (Lipsky et al., 1994, Rheumatoid Arthritis, Harrison's Principles of Internal Medicine, 13th ed. New York, McGraw-Hill, Inc., pp. 1648-1655). At present, there is no cure for RA, and the joint damage is irreversible.
Although the course of RA is highly variable, most patients with clinical, persistent RA eventually develop debilitating joint damage and deformation, resulting in progressive functional limitation. Consequently, RA is considered a highly disabling disease with a considerable economic impact that some liken to that of coronary artery disease (Allaire et al., 1994, Pharmacoeconomics 6:513-522). A 1993 study in the U.S. estimated total annual direct costs of $5275 per patient with indirect costs as high as $21,000 per year (Merkesdal et al., 2001, Arthritis Rheum. 44:528-534).
RA is thought to be a complex disease precipitated by the interplay of environmental and genetic factors. Although several environmental triggers have been suggested, such as infection (Harris, 1990, N. Engl. J. Med. 322:1277-1289), immunization (Symmons and Chakravarty, 1993, Ann. Rheum. Dis. 52:843-844), diet (Shapiro et al., 1996, Epidemiology 7:256-263), and smoking (Symmons et al., 1997, Arthritis Rheum. 40:1955-1961), none have been established. A genetic component to RA susceptibility has long been indicated by data from twin and family studies. It is estimated that the concordance between monozygotic twins is in the range of 12-15% while the prevalence in siblings of RA probands is approximately 2-4%, both well above the estimated background population prevalence of 0.5-1% (Seldin et al., 1999, Arthritis Rheum. 42:1071-1079). From these data, the disease heritability has been estimated at approximately 60% (MacGregor et al., 2000, Arthritis Rheum. 43:30-37) while the relative recurrence risk for siblings (λs) of probands with RA is estimated at between 5 and 10 (Seldin et al. 1999; Jawaheer et al., 2001, Am. J. Hum. Genet. 68:927-936).
Among the many genetic studies of small cohorts, the only genes consistently associated with RA to-date are the MHC, i.e. human leukocyte antigen (HLA)-linked genes, on the short arm of human chromosome 6. It has been estimated that approximately one-third to one-half of the total genetic contribution to RA can be attributed to genes within the HLA complex (Deighton et al., 1989, Clin. Genet. 36:178-182). This observation has been supported by three recent genome-wide scans using RA-affected sibling pairs (Cornelis et al., 1998, Proc. Natl. Acad. Sci. USA 95:10746-10750; Jawaheer et al., 2001, Am. J. Hum. Genet. 68: 927-936; MacKay et al., 2002, Arthritis Rheum. 46:632-639). The vast majority of HLA studies of RA have focused on a direct role of HLA-DRB1 alleles that encode a common structural element, but recent evidence suggests that the HLA association is more complex and probably involves other loci within this gene complex. Despite over 20 years of research, it is still unclear whether the implicated HLA-DRB1 alleles are involved in RA susceptibility or severity of disease (Weyand et al., 1998, Springer Semin. Immunopathol. 20:5-22; Fries et al., 2002, Arthritis Rheum. 46:2320-2329).
The increasing availability of specific therapies that can halt disease progression has magnified the need for accurate early diagnosis of RA (Maini et al., 1999, Lancet 354:1932-1939; Lipsky et al., 2000, N. Engl. J. Med. 343: 1594-1602; Weinblatt et al., 1999, N. Engl. J. Med. 340: 253-259). Unfortunately, classifying an arthritic disorder such as RA is challenging because no etiological agent has been identified and no clinical or laboratory features uniquely define the disease (Sangha, 2000, Rheumatology 39 (suppl. 2): 3-12). The most commonly used diagnostic criteria are those adopted by the American College of Rheumatology in 1987 (Arnett et al., 1988, Arthritis Rheum. 31: 315-324), which are based on a combination of clinical, laboratory and radiological assessments. A patient is classified as having RA if he or she satisfies at least four of the following seven criteria: (i) morning stiffness lasting at least one hour; (ii) arthritis of three or more joint areas; (iii) arthritis of hand joints; (iv) symmetric arthritis; (v) rheumatoid nodules; (vi) presence of serum rheumatoid factor (RF); and (vii) radiographic changes in hand or wrist joints. Using these criteria, a trained rheumatologist can usually diagnose RA in individuals who have had disease for more than 12 weeks (Harrison et al., 1998, J. Rheumatol. 25: 2324-2330). However, these criteria are largely ineffective for patients during early stages of the disease, such as during the first 12 weeks of disease (Green et al., 1999, Arthritis Rheum. 42: 2184-2188), during which time irreversible joint damage has already begun, and cannot predict which patients will develop severe erosive disease and therefore benefit from aggressive early disease modifying therapy.
Early initiation of therapy can provide considerable benefit, not only by reducing pain and inflammation but also by reducing or eliminating the loss of function that accompanies persistent RA, especially when therapy is administered prior to the occurrence of irreversible joint damage. Consequently, there is a need for novel diagnostic markers that enable the detection of RA at an early stage, or that enable the identification of individuals who are predisposed to developing RA. While genome-wide linkage scans of affected sibling pairs have revealed multiple linkage peaks (Cornelis et al., 1998, Proc. Natl. Acad. Sci. USA 95: 10746-10750; Jawaheer et al., 2001, Am. J. Hum. Genet. 68: 927-936; MacKay et al., 2002, Arthritis Rheum. 46: 632-639), specific disease-associated genetic markers have not previously been identified.
SNPs
The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor genetic sequences (Gusella, Ann. Rev. Biochem. 55, 831-854 (1986)). A variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral. In some instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. Additionally, the effects of a variant form may be both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. In many cases, both progenitor and variant forms survive and co-exist in a species population. The coexistence of multiple forms of a genetic sequence gives rise to genetic polymorphisms, including SNPs.
Approximately 90% of all polymorphisms in the human genome are SNPs. SNPs are single base positions in DNA at which different alleles, or alternative nucleotides, exist in a population. The SNP position (interchangeably referred to herein as SNP, SNP site, SNP locus, SNP marker, or marker) is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations). An individual may be homozygous or heterozygous for an allele at each SNP position. A SNP can, in some instances, be referred to as a “cSNP” to denote that the nucleotide sequence containing the SNP is an amino acid coding sequence.
A SNP may arise from a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or vice versa. A SNP may also be a single base insertion or deletion variant referred to as an “indel” (Weber et al., “Human diallelic insertion/deletion polymorphisms”, Am J Hum Genet 2002 October; 71(4):854-62).
A synonymous codon change, or silent mutation/SNP (terms such as “SNP”, “polymorphism”, “mutation”, “mutant”, “variation”, and “variant” are used herein interchangeably), is one that does not result in a change of amino acid due to the degeneracy of the genetic code. A substitution that changes a codon coding for one amino acid to a codon coding for a different amino acid (i.e., a non-synonymous codon change) is referred to as a missense mutation. A nonsense mutation results in a type of non-synonymous codon change in which a stop codon is formed, thereby leading to premature termination of a polypeptide chain and a truncated protein. A read-through mutation is another type of non-synonymous codon change that causes the destruction of a stop codon, thereby resulting in an extended polypeptide product. While SNPs can be bi-, tri-, or tetra-allelic, the vast majority of the SNPs are bi-allelic, and are thus often referred to as “bi-allelic markers”, or “di-allelic markers”.
As used herein, references to SNPs and SNP genotypes include individual SNPs and/or haplotypes, which are groups of SNPs that are generally inherited together. Haplotypes can have stronger correlations with diseases or other phenotypic effects compared with individual SNPs, and therefore may provide increased diagnostic accuracy in some cases (Stephens et al. Science 293, 489-493, 20 Jul. 2001).
Causative SNPs are those SNPs that produce alterations in gene expression or in the expression, structure, and/or function of a gene product, and therefore are most predictive of a possible clinical phenotype. One such class includes SNPs falling within regions of genes encoding a polypeptide product, i.e. cSNPs. These SNPs may result in an alteration of the amino acid sequence of the polypeptide product (i.e., non-synonymous codon changes) and give rise to the expression of a defective or other variant protein. Furthermore, in the case of nonsense mutations, a SNP may lead to premature termination of a polypeptide product. Such variant products can result in a pathological condition, e.g., genetic disease. Examples of genes in which a SNP within a coding sequence causes a genetic disease include sickle cell anemia and cystic fibrosis.
Causative SNPs do not necessarily have to occur in coding regions; causative SNPs can occur in, for example, any genetic region that can ultimately affect the expression, structure, and/or activity of the protein encoded by a nucleic acid. Such genetic regions include, for example, those involved in transcription, such as SNPs in transcription factor binding domains, SNPs in promoter regions, in areas involved in transcript processing, such as SNPs at intron-exon boundaries that may cause defective splicing, or SNPs in mRNA processing signal sequences such as polyadenylation signal regions. Some SNPs that are not causative SNPs nevertheless are in close association with, and therefore segregate with, a disease-causing sequence. In this situation, the presence of a SNP correlates with the presence of, or predisposition to, or an increased risk in developing the disease. These SNPs, although not causative, are nonetheless also useful for diagnostics, disease predisposition screening, and other uses.
An association study of a SNP and a specific disorder involves determining the presence or frequency of the SNP allele in biological samples from individuals with the disorder of interest, such as rheumatoid arthritis, and comparing the information to that of controls (i.e., individuals who do not have the disorder; controls may be also referred to as “healthy” or “normal” individuals) who are preferably of similar age and race. The appropriate selection of patients and controls is important to the success of SNP association studies. Therefore, a pool of individuals with well-characterized phenotypes is extremely desirable.
A SNP may be screened in diseased tissue samples or any biological sample obtained from a diseased individual, and compared to control samples, and selected for its increased (or decreased) occurrence in a specific pathological condition, such as pathologies related to rheumatoid arthritis. Once a statistically significant association is established between one or more SNP(s) and a pathological condition (or other phenotype) of interest, then the region around the SNP can optionally be thoroughly screened to identify the causative genetic locus/sequence(s) (e.g., causative SNP/mutation, gene, regulatory region, etc.) that influences the pathological condition or phenotype. Association studies may be conducted within the general population and are not limited to studies performed on related individuals in affected families (linkage studies).
Clinical trials have shown that patient response to treatment with pharmaceuticals is often heterogeneous. There is a continuing need to improve pharmaceutical agent design and therapy. In that regard, SNPs can be used to identify patients most suited to therapy with particular pharmaceutical agents (this is often termed “pharmacogenomics”). Similarly, SNPs can be used to exclude patients from certain treatment due to the patient's increased likelihood of developing toxic side effects or their likelihood of not responding to the treatment. Pharmacogenomics can also be used in pharmaceutical research to assist the drug development and selection process. (Linder et al. (1997), Clinical Chemistry, 43, 254; Marshall (1997), Nature Biotechnology, 15, 1249; International Patent Application WO 97/40462, Spectra Biomedical; and Schafer et al. (1998), Nature Biotechnology, 16, 3).