DNA accessibility, along with chromatin regulation and genome methylation, plays a key role in the regulatory machinery of DNA transcriptional events that can promote tumor growth. Locations where DNA is not tightly bound in nucleosomes, detectable as DNase I hypersensitivity (DHS) sites, can render a DNA sequence accessible to other DNA-binding proteins, including a wide range of transcription factors (TFs). DHS sites are cell specific and play a crucial role in determining cell-selective transcriptional events.
Furthermore, genome wide association studies (GWAS) have revealed that the vast majority of genetic variants significantly associated with many diseases and traits are located in non-coding regions. Among such non-coding single nucleotide polymorphisms (SNPs), well over half affect DHS sites. Thus, variable access to DNA regulatory elements not only plays a key role in normal cell development, but also in altered expression profiles associated with disease states.
However, understanding the impact of DNA sequence data on transcriptional regulation of gene expression is a challenge, particularly in noncoding regions of the genome.
In an effort go beyond genome wide association studies and gain deeper insight into how changes in DNA sequence data impact transcriptional regulation, neural network models have been developed for predicting DNA accessibility in multiple cell types. In theory, these models can make it possible to explore the impact of mutations on DNA accessibility and transcriptional regulation.
One common issue that limits the broad applicability of neural networks for predicting DNA accessibility is the cell-type-specific nature of many of the underlying biological mechanisms, such as DHS sites. Current examples of neural network models have addressed this issue by either training a separate model for each cell type or by having a single model output multiple cell-type-specific (multi-task) predictions. However, these limitations make it difficult to apply current neural network models to new data and limits them from being integrated into broader scope pathway models. Thus, there remains a need for a neural network solution that overcomes the current barrier to broad applicability due to cell-specific phenomena.