The present invention concerns methods, particularly computer-based methods, along with corresponding systems and computer code products for use in conducting linkage disequilibrium studies on pedigrees containing multiple nuclear families and/or discordant sibships.
Family-based tests for linkage and allelic association, i.e., linkage disequilibrium, have received a great deal of attention in the past several years. The transmission/disequilibrium test (TDT) was proposed to test for linkage disequilibrium in family triads, containing two parents and an affected offspring (R. Spielman et al., Am. J. Hum. Genet. 52, 506-516 (1993)). The TDT was extended to allow for multiple affected offspring while remaining a valid test of linkage disequilibrium (E. Martin et al., Am. J. Hum. Genet. 61, 439-448 (1997)). For late onset diseases for which parents may not be available, a battery of tests using phenotypically discordant sib pairs has been proposed (D. Curtis, Ann. Hum. Genet. 61, 319-333 (1977); R. Spielman and W. Ewens, Am. J. Hum. Genet. 62, 450-458 (1998); M. Boehnke and C. Langefeld, Am. J. Hum. Genet. 62, 950-961 (1998)). Recently, the sibship disequilibrium test (SDT) was proposed to allow for the use of discordant sibships of larger size (S. Horvath and N. Laird, Am. J. Hum. Genet. 63, 1886-1897 (1998)). A limitation of these tests is that, while they remain valid tests of linkage, they are not valid tests of association if related nuclear families and/or sibships from larger pedigrees are used.
Accordingly, there remains a need for valid tests of linkage disequilibrium that employ related nuclear families and/or sibships from larger pedigrees.
Often data are available for larger pedigrees with multiple nuclear families and/or discordant sibships, and it would be desirable to have a valid test of linkage disequilibrium that can use all potentially informative data, even from extended pedigrees. With this goal, we have developed the Pedigree Disequilibrium Test (PDT) for analysis of linkage disequilibrium in general pedigrees. This test uses data from related nuclear families and discordant sibships from extended pedigrees. Furthermore, the test retains a key property of the TDT in that it is valid even when there is population substructure.
The difficulty with testing for association with related families, is that genotypes of related individuals are correlated if there is linkage, even if there is no allelic association in the population. Thus, it is incorrect to treat nuclear families or discordant sibships from extended pedigrees as independent when testing for association if there is linkage. An appropriate strategy is to base the test on a random variable measuring linkage disequilibrium for the entire pedigree rather than treating related nuclear families or sibships as if they were independent. A measure of linkage disequilibrium is defined for each triad and each discordant sib pair within a pedigree, and the average of the quantities is the measure of linkage disequilibrium for the pedigree. It is these random variables for independent pedigrees that form the basis of the PDT.
The present invention provides a method for the analysis of linkage disequilibrium between at least one marker locus and a disease or trait locus of interest. The method comprises the steps of:
(a) providing a data set comprising a marker locus with at least two alleles for a plurality of extended pedigrees (e.g., plant pedigrees, animal pedigrees), where N is the number of unrelated extended pedigrees, at least one of said extended pedigrees containing at least one informative nuclear family or informative discordant sibship; then
(b) determining a random variable XT for each triad within a nuclear family or informative nuclear family for each allele Mi;
(c) determining a random variable XS for each DSP within a discordant sibship or informative discordant sibship for each allele Mi;
(d) determining a summary random variable D from XS and XT for each of said extended pedigrees for each allele Mi, and then;
(e) determining a statistic T from each of said summary random variables D from each of said N unrelated extended pedigrees for each allele Mi an extreme value for T indicating greater linkage disequilibrium.
A variety of different types of data can be accommodated. For example, each of said extended pedigrees can contain at least one nuclear family; each of said extended pedigrees can contain at least one discordant sibship; each of said extended pedigrees containing at least one nuclear family and at least one informative discordant sibship; etc.
In a particular embodiment, the step (b) of determining a random variable XT for each triad within a nuclear family for each allele Mi is carried out according to the formula:
XT=(Number Mi transmittedxe2x88x92Number of Mi not transmitted).
In a particular embodiment, said step (c) of determining a random variable XS for each DSP within an informative discordant sibship for each allele Mi is carried out according to the formula:
XS=(Number Mi in affected sibxe2x88x92Number of Mi in unaffected sib).
In a particular embodiment, each of the extended pedigrees contains nT triads from informative nuclear families and nS DSPs from informative discordant sibships, and the step (d) of determining a summary random variable D from XS and XT for each of said extended pedigrees for each allele Mi is carried out according to the formula:   D  =            1                        n          T                +                  n          S                      ⁡          [                                    ∑                          j              =              1                                      n              T                                ⁢                      X            Tj                          +                              ∑                          j              =              1                                      n              S                                ⁢                      X            Sj                              ]      
In a particular embodiment, the step (e) of determining a statistic T from each of said summary random variables D from each of said N unrelated extended pedigrees for each allele Mi is carried out according to formula (1):                     T        =                                            ∑                              i                =                1                            N                        ⁢                          D              i                                                                          ∑                                  i                  =                  1                                N                            ⁢                              D                i                2                                                                        (        1        )            
wherein T is a disequilibrium statistic for allele Mi, an extreme value for T indicating greater linkage disequilibrium.
In a particular embodiment, the method further comprises the step of: (f) determining a global statistic Txe2x80x2 from each statistic T for each allele Mi, an extreme value for Txe2x80x2 indicating greater linkage disequilibrium.
A particular embodiment of the PDT procedure described above is referred to herein as the xe2x80x9cPDT-Sumxe2x80x9d statistic. In this, each of the extended pedigrees contains nT triads from informative nuclear families and nS DSPs from informative discordant sibships, and the step (d) of determining a summary random variable D from XS and XT for each of said extended pedigrees for each allele Mi is carried out according to the formula:   D  =                    ∑                  j          =          1                          n          T                    ⁢              X        Tj              +                  ∑                  j          =          1                          n          S                    ⁢              X        Sj            
The nT triads from informative nuclear families and nS DSPs from informative discordant sibships are selected based upon genotype as described herein or on a criteria other than genotype as described below.
Another particular embodiment of the PDT procedure is referred to as the xe2x80x9cPDT-Averagexe2x80x9d procedure herein. In this, each of the extended pedigrees contains nT triads from nuclear families and nS DSPs from discordant sibships, and the step (d) of determining a summary random variable D from XS and XT for each of said extended pedigrees for each allele Mi is carried out according to the formula:   D  =            1                        n          T                +                  n          S                      ⁡          [                                    ∑                          j              =              1                                      n              T                                ⁢                      X            Tj                          +                              ∑                          j              =              1                                      n              S                                ⁢                      X            Sj                              ]      
However, in the PDT average procedure, the nT triads from nuclear families and nS DSPs from discordant sibships are selected based upon a criteria other than genotype (including, but not limited to, criteria such as age, gender, clinical characteristic, phenotype, and random (that is, randomly chosen)). As will be appreciated, using all, or the total number, of nuclear families and discordant sibships is necessarily a selection based upon a criteria other than genotype.
The foregoing and other objects and aspects of the present invention are explained in greater detail in the drawings herein and the specification set forth below.