During the last ten years a major development in the analysis of diseases and biological traits from a genetic viewpoint has been the introduction of Genome Wide Association studies (GWAS). A GWAS offers the ability to measure hundreds of thousands of genetic markers or single-nucleotide polymorphisms (SNPs) across the genome and provides a way to identify candidate genes related to a wide range of traits (for example, height, weight) and diseases (for example, breast cancer, asthma). Since 2005 alone, it is estimated that over 2,700 GWA studies have been conducted at an average cost of $500,000 per study. Given the high costs involved in running a GWAS, there is clearly a great need to ensure that the information in the collected data is fully utilised.
One of the main aims of GWAS is to identify DNA features which if not causal, are at least statistically significantly associated with increased risk of various diseases/traits or increased benefit from specific treatments. The single locus analysis approach used (predominantly) in analysis of these studies to date has yielded only modest results.
In order to overcome this roadblock, development of analysis techniques for detection of higher order interactions among DNA features is required. Higher order analysis of DNA interactions generally attracts the problem of having the numbers of features measured in genomic data, vastly exceeding the number of samples—the so called “curse of dimensionality”, which requires development of new, powerful statistical and computational techniques.