The present application relates to a process for testing associations of genes and diseases, that is, which specific genes predispose to important common diseases. The invention further relates to a process for selecting a set of families to be used in testing an association between an allele and a disease.
The publications and other materials used herein to illuminate the background of the invention, and in particular, cases to provide additional details respecting its practice, are incorporated herein by reference and for convenience are numerically referenced in the following text and respectively grouped in the appended bibliography.
The determination of the association of a gene with a disease has considerable practical importance, because once a particular gene is known to predispose to a serious disease, there are established procedures for determining how each of these genes acts and for understanding the interaction of existing environmental causes with the gene to produce the serious disease. Just as the discovery of the polio virus and its growth in cell culture led to the highly successful polio vaccines, connecting a specific gene with a specific common disease provides the basis for preventing this disease in genetically susceptible individuals (1-3, for example). The connection of a gene with a disease is also useful for the diagnosis of persons at risk for expressing the disease.
Modern techniques of molecular biology make it possible to identify any human gene for which certain minimal information is known. Given the precision of these laboratory methods, a generally applicable, convincing and systematic process to connect--"associate" is the technically correct term--genes with important common diseases is needed, and has not been available prior to the present invention.
There are limited situations in which it is easy to detect the association of a specific gene with a particular disease. These situations are the inherited diseases that are inherited in what is termed a "Mendelian pattern." An example of such a disease is Huntington's disease. Each person who inherits one copy of a gene for Huntington's disease will develop that disease during his or her lifetime, if he or she lives long enough. From the pattern of inheritance in families with this disorder, it was easy to determine that this disorder was a consequence of a single mutant allele at a particular gene locus. There are diseases, like cystic fibrosis and sickle cell anemia, in which every person who has two alleles for the disease develops it. Other diseases are transmitted through genes located on the X chromosome. While there are several thousand different Mendelian inherited disorders, each of them is individually quite rare and together they do not account for most of the known important predisposition to common diseases. Even though Mendelian disorders contribute little directly to understanding the genetics of common disorders, it is important to begin by describing the prior art with this class, since it is basic to understanding the more complex approaches to the genes for common diseases.
In the first instance, the idea that an autosomal, dominant disorder like Huntington's disease was determined by a single mutant allele at a single genetic locus in each family was inferred from the pattern of inheritance in Huntington's disease families. By now, the idea that Huntington's disease or other Mendelian disorders are each the result of a single mutant allele has been well confirmed through linkage studies and, in certain interesting cases, by identifying the mutant allele itself.
As indicated above, the genes that predispose to serious common diseases such as coronary heart disease, diabetes mellitus, cancer, mental illness, or mental retardation do not produce their clinical effects in such a way that a Mendelian pattern can be observed in families in which one of these genes is transmitted. The basic reason for this fact is that the genes that predispose to these serious common diseases do not produce the common disease in every person who carries the gene. It is known that genes predispose to breast cancer, for example, because breast cancer is more common among first-degree relatives of breast cancer patients than it is in the general population. It is also known that alcoholism or certain forms of mental illness can result from genetic predisposition because the children of alcoholics or mentally ill individuals have a higher risk of developing the same illness, even when they are adopted away from their biological parents early in infancy. Evidence of this sort is available for many common diseases. From this evidence it is known that genes are often important in predisposing to these common diseases, but from these data alone nothing can be learned about the specific genes or mechanisms involved.
The principal approach to identifying a few specific genes that predispose to common diseases has been that of "population association studies." The general concept underlying this approach is that of "allele frequency." At any gene locus in the entire genome, there may be two or more alternative versions of the gene, called alleles. For simplicity, and without any loss of generality, the case in which there are only two alternative alleles at the single locus of interest will be considered. The more common of the two alleles is designated A and the less common allele is designated a. Since the non-sex chromosomes (autosomes) are paired, each person can be homozygous for the common allele, AA, homozygous for the rare alleles, aa, or heterozygous for both alleles, Aa.
In any population of N individuals, there will be a total of 2N copies of this particular gene. Among these 2N alternatives, the proportion of genes that are actually allele a is the allele frequency of a. In the case of only two alternative alleles, the allele frequencies for A and a will sum to 1.00.
It is a basic fact of human population genetics that for any gene locus and pair of alleles, the allele frequency in one population is likely to differ substantially from that in another population. For example, the frequency of allele a might be 0.048 in one precinct in Cincinnati and 0.073 in another. While the population allele frequency can be determined methodically for any population and for any alleles for which there is a totally specific and sensitive test, in practice allele frequencies are only determined in specific situations, such as during the study of a specific indigenous tribe or when blood grouping or HLA typing is done for clinical purposes or as part of a defined population survey.
Population-based tests of gene-disease associations are carried out in the following way. Suppose that allele a is hypothesized to predispose to disease D. To test this hypothesis, the frequency of allele a in a population of patients with disease D is compared to that in a comparison or control population. Evidence for this hypothesized association would consist of finding a significantly higher frequency of the allele in the disease population compared to controls. The allele frequency in the control population is taken to be representative of that in the general population from which the population of individuals with disease D was selected (4).
This population method for detecting important gene-disease associations has been effective in certain limited circumstances, particularly in verifying associations between specific HLA alleles (5) and common diseases, such as ankylosing spondylitis or insulin-dependent diabetes mellitus. It is recognized widely, however, that this approach is severely limited for testing many important gene-disease associations. In fact, because of these limitations, conflicting results have been obtained for a single hypothesized association like that of breast cancer with specific H-ras alleles (6, 7). The most important limitation of population-based tests of gene-disease associations is that it is very difficult to match, for the important stratification variables that influence allele frequency, the population with disease D to a comparison population. As pointed out above, allele frequencies can differ widely between different ethnic groups and even different socio-economic strata because of patterns of migration and mating. This variation is such a dominant source of error that it can give misleading positive or negative results. When the association is extremely strong, such as that of ankylosing spondylitis and HLA B27 (5), this limitation does not apply. Even then, it was important to confirm the association in many different ethnic groups.
The other major limitation on population-based tests of gene-disease associations is that they have relatively poor statistical power. This issue is particularly important because the sample size required to achieve a specified level of statistical power increases dramatically as the allele frequency in the general population falls. Thus, gene-disease associations where the hypothesized disease-predisposing allele has a frequency around one percent, which is the common situation, require extremely large samples, making both the matching of the disease and control group and the replication of the study more difficult. At the present time, many important gene-disease associations are either controversial or have gone entirely untested because of the limitations of the population approach to testing these associations.
It has been proposed that genomic mapping through linkage analysis, a wholly satisfactory procedure for Mendelian conditions, might be applied to localize genes that predispose to common non-Mendelian disorders. Currently available molecular genetic techniques have greatly enhanced the power of genetic and epidemiologic strategies used to identify specific genes that predispose to common chronic diseases. For example, the increasingly detailed genomic map of DNA polymorphisms may make it possible to adapt linkage methods, highly successful for recognized Mendelian syndromes, to map some genes for non-Mendelian chronic disorders (8). However, because of clinical and genetic heterogeneity (9), the practical usefulness of linkage analysis in this setting remains to be determined. At this time, there are no successful practical, general applications of this proposal. It is not clear, in fact, whether the assumptions under which this generalization of linkage analysis is to be carried out are realistic enough for useful results to be obtained. Of all the limitations of generalizing linkage analysis to non-Mendelian diseases, the most important is that these disorders are genetically heterogeneous: genes at many loci can predispose to the same common disorder.