The invention relates to a method of determining the frequency of an allele within a given population or group, and in particular to a method of determining allele frequencies for single nucleotide polymorphisms (SNPs) or other mutations or genetic variations (e.g. nucleotide insertions, additions or deletions, gene, chromosome or genome duplications (or multiplications) etc. in pooled nucleic acid samples or other samples (including single samples) which may contain allelic variants.
Individuals in populations will have genetic differences. The genetic differences may be represented as the individuals in the population having different alleles at a given locus. Alternatively genetic differences can be related to gene, chromosome, or whole genome duplications (or other multiplications). The allele frequency describes the fraction of the population exhibiting a particular allele. Over a whole population, there may be many different alleles at a particular locus. However, where the genetic difference occurs as alterations of a single nucleotide (single nucleotide polymorphisms or SNPs), generally only 2 alleles are present in the population, although triallelic or tetrallelic SNPs are known. Studies of allelic association in populations are one of the most useful and powerful methods for mapping genes/mutations that contribute to disease. Such studies require the determination of the genotype (i.e. which allele is present) at one or several loci in a population. The frequency of a particular allele in a given population can be assessed, and the association of that allele with a disease or other clinical condition (e.g. predisposition to disease, therapeutic responsibility etc.) can be studied.
Single nucleotide polymorphisms (SNPs) are regularly used for genetic association studies, and consist of single nucleotide substitutions. SNPs are normally biallelic markers (i.e. there are 2 alleles present in the population), and are the markers of choice for various types of genetic analysis, because of their high frequency in the genome. SNPs are found approximately once every 100 to 1000 bases in the human genome. An SNP has a prevalence of at least 1% in a given population. Further, they are stable, having much lower mutation rates than repeat sequences, for example. The analysis of SNPs is of great importance in several disciplines within the applied genomic field. Importantly, the nucleotide sequence variations that are most likely to be responsible for the functional changes of interest will be SNPs. Such variations are therefore of great interest, and many studies directed to identify functional SNPs contributing to (or associated with) a particular trait or disease (“phenotype”) have been performed. Thus many diseases and conditions may be associated with (or linked to) single nucleotide polymorphisms, either alone or in combination. For example, in WO 00/22166, it has been suggested that a combination of SNPs within several genes gives a polymorphic pattern which may be used to predict the likelihood of developing cardiovascular disease. Obtaining reliable and accurate data on the frequencies of a given SNP allele in a given population without testing each member of the population would have a revolutionary impact on the efficiency and cost of analysis for large population studies.
However, the frequency of other genetic mutations or variants, e.g. insertion/addition/deletion mutations and gene, chromosome or genome duplications (in the sense of any number of multiplications or repeats), and those studied in cancer genetics and chromosomal abnormality (e.g. trisomy) cases, can be analysed by the method of the invention.
Allelic association means that across a given population, individuals who have a certain allele at one locus may have a statistically higher chance of developing a particular disease, for example. Thus, the possession of a particular allele can cause direct susceptibility to a disease. Alternatively, the possession of a particular allele may be indirectly linked to disease susceptibility via association with the “disease” allele.
Association studies attempt to find genes that influence or increase susceptibility to disease or traits in any organism. This involves determining the frequency of an allele from a population of organisms with that trait or disease and comparing the results with a control population that do not exhibit the disease or trait. Various statistical/mathematical methods are known and described in the art for assessing allele frequencies based on such studies. In order to perform large-scale association studies for single nucleotide polymorphisms, methods have included labourious and expensive individual genotyping of individual nucleic acid samples. Pooling of nucleic acid samples in order to obtain allele frequency information has been used to reduce the burden of genotyping individual samples. To date, most pooling investigations have centred on the use of microsatellite polymorphisms, with few methods developed for the rapid assessment of SNPs in a given population.
Studies on allele frequencies tend to rely on radiation-based methods, or gel electrophoresis, which have well-known drawbacks. A method of determining SNP allele frequency using allele-specific fluorescent probes in the Taqman® assay (Breen et al., Biotechniques 2000, 28(3) 464–470) has been developed by PE Biosystems. In this technique Taqman® probes are used to detect specific sequences in Polymerase Chain Reaction (PCR) products by employing the 5′ 3′ exonuclease activity of Taq polymerase. The Taqman® probe anneals to the target sequence between the traditional forward and reverse PCR primers. The Taqman® probe is labelled with a reporter fluorophore and a quencher fluorochrome. This technique relies on the possibility of designing allele specific probes that match the annealing temperature of the PCR primers. Moreover, the allele specificity of the probe is, in the case of SNPs, determined by one out of 17–30 bases. These restrictions make it hard to design allele specific probes showing good enough temperature discrimination not to bind to the other allele. Hence, the signal from such an assay might not always accurately represent the frequency of the probe specific allele. A disadvantage of this method may be in finding assay conditions where a mismatch results in clearly distinguishable difference in cleavage of the reporter fluorophore on the two alleles. Further, Taqman® probes have different dyes at the 5′ and 3′ ends and are therefore costly to produce, and must be carefully designed. Taqman requires two reactions in order to measure allele frequency, using a different probe in each of the two reactions, complementary to either allele. It would therefore be advantageous to develop a method of determining SNP allele frequencies in pooled nucleic acid in one reaction which was accurate, reliable and that avoided the need for labels or relied on probe binding to the SNP site.