1. Field of the Invention
The present invention is directed generally to genetic and genomic analysis and more particularly to methods for identifying mutations and methylation patterns in diploid DNA sequence data.
2. Description of the Related Art
Genetic Variation
It is well known that certain genetic variations (also called genetic mutations or polymorphisms) can result in disease susceptibility or differential drug response in individuals. It is of great interest to identify those genetic variations in disease and drug research.
In a diploid species, each individual has two copies of each gene. These may be derived from parental genes with one being a maternal copy and the other a paternal copy. The two copies are also known as alleles. A challenging problem is to identify the two sets of genetic variation that are on the two respective alleles for gametic phase unknown DNA sequence data.
There are two common approaches to the problem. One approach is to use molecular methods to separate the two alleles and identify the genetic variation on each separately. Such methods include Single Molecule Dilution (SMD), Allele-Specific polymerase chain reaction (PCR), and several cloning approaches. The molecular methods, however, are complex and expensive. They do not appear to be practical for regular, large-scale test under current technology. Another approach is to use computational methods to resolve the ambiguity in genetic variances assignment and derive the true alleles from data that is composed of a mixture of the two alleles.
DNA Methylation
With regard to DNA methylation it has been known that methylation of cytosine-rich regions of DNA is involved in gene silencing. Differential methylation was found to be a key element in the transcription regulation of genes. Experiments have shown that methylation of the so-called CpG island of a gene inhibits transcription. It is believed that methylation either directly inhibits the binding of transcription factors or methylcytosine-binding proteins interact with other structural compounds, therefore making the DNA inaccessible to transcription factors. Methylation status has been shown to be associated with disease. For example, the development of certain cancers in mammalia was found to be accompanied by genome wide demethylation and local hypermethylation of tumor suppressor genes. As a result, it is of great interest to obtain the methylation pattern in DNA, in addition to the genetic variation in studying disease susceptibility and associated treatments.
After the introduction of the Bisulfite Genomic Sequencing technique, it has become possible to study small amounts of DNA and to identify methylation patterns with single base resolution. This technique selectively deaminates unmethylated cytosine to uracil in the DNA strand by treatment of the DNA with a solution of sodium bisulfite. PCR amplification of such bisulfite-treated DNA replicates the uracils as thymines. After PCR, methylation patterns can be identified by a comparison of the PCR amplified sequence with the original sequence. As used herein, bisulfite-treated DNA refers to DNA that has been treated with a bisulfite solution and PCR amplified to replace all unmethylated cytosines with thymines.
Methylation identification can be readily solved by the technique if the gametic phase is known for the underlying DNA sequence data. However, commonly available sequence data usually has two confounding characteristics: (1) the gametic phase is unknown (i.e., each allele can not be assigned to a particular parent), and (2) the data contains underlying genetic variation (e.g., insertion or deletion of specific nucleotide bases relative to a reference sequence). For such data, applying the Bisulfite Genomic Sequencing technique alone may not be able to resolve actual methylation patterns. Solutions are needed which can resolve unknown phase sequence data and account for the underlying genetic variation in identifying methylation patterns.
Therefore, it can be appreciated that there is a significant need for techniques to identify gametic phase and genetic mutations. The present invention provides this and other advantages as will be apparent from the following detailed description and the accompanying figures.