A large body of information has been obtained about the state of the transcriptome, chromatin modifications, and CpG methylation in various cell types (Nguyen, et al., Nucleic Acids Research, 29:4598-4606 (2001), Hahn, et al., PloS One 6, e18844 (2011), Smallwood, et al., Nat Genet, 43:811-814 (2011), Negrotto, et al., Leukemia, 26:244-254 (2012), Alelu-Paz, et al., J Signal Transduct, 2012:956958 (2012)). However, better technologies for analysis of single cell (and a low number of cells) are still desired.
Chromatin
The maintenance of chromatin architecture is a dynamic and complex process. Broadly speaking, chromatin can be present in either an open state (accessible to transcription factors and other proteins), or in a compacted state. Compact chromatins are often associated with silencing of genes and resistance to DNase I digestion (Francastel, et al., Molecular Cell Biology, 1:137-143 (2000), Teif, et al., Nucleic Acids Research, 37:5641-5655 (2009)). Chromatin remodeling plays a role in regulating gene expression and in several biological processes, such as DNA replication and repair, apoptosis, development and pluripotency (Wu, et al, The Journal of Biological Chemistry, 272:28171-28174 (1997), Clapier, et al., Annual Review of Biochemistry, 78:273-304 (2009)). Genome wide chromosome conformation studies (HiC) indicate that, at the megabase level, cellular chromatin can be partitioned into large blocks of relatively open or closed chromatin (van Berkum, et al., Journal of Visualized Experiments, JoVE 39, pii: 1869 (2010), Nagano, T., Lubling, et al., Nature, 502:59-64 (2013)). At a finer scale level, nuclear lamin associated chromatin may be in a closed configuration (Zullo, Cell 149:1474-1487 (2012)).
Studies of the distribution of histone modifications with regard to chromatin architectures on a genome wide scale have led to several generalizations (Wang, et al., Trends Mol Med, 13:363-372 (2007), Bell, et al., Nat Rev Genet, 12:554-564 (2011), Zhu, et al., Cell, 152:642-654 (2013), Song, et al., Genome Research, 21:1757-1767 (2011), Geiman, et al., J Cell Biochem, 87:117-125 (2002)). Methylation of specific lysine residues such as K9 and K27 in H3 is associated with compaction of chromatin thereby preventing binding of transcription factors to the DNA and gene repression. On the contrary, histone acetylation relaxes chromatin condensation and exposes DNA for TF binding, leading to increase gene expression, and trimethylation of other lysine residues on histone H3 (K4 and K36 trimethylations are associated with actively transcribed genes). The distribution of these patterns of histone modifications is partly cell type-specific with marked differences between, for example, freshly isolated cells and cells maintained in tissue culture, or between ES cells and differentiated somatic cells, etc. There are also regions of DNA devoid of commonly studied histone modifications.
DNase I hypersensitive sites (DHS) in chromatin represent open chromatin sites where canonical nucleosomes are displaced, particularly by other sequence specific DNA binding proteins. They were first mapped over 30 years ago and identified as stable marks of cell differentiation (Weintraub, et al., Science, 193:848-856 (1976)). In addition to transcriptome, methylome and chromatin immunoprecipitation (ChIP) studies, with the advent of next generation sequencing (NGS), DHS analysis has been revitalized, refined, and expanded to a whole genome scale (Crawford, et al., Genome Research, 16:123-131 (2006), Ling, et al., Methods in Molecular Biology, 977:13-19 (2013)). The results of the current generation of DHS studies have helped map promoters and enhancers acting in particular cell types as well as reveal a plethora of potential regulatory regions of unknown function (Degner, et al., Nature, 482:390-394 (2012); Mercer, et al., Nat Genet, 45:852-859 (2013); Apostolou, et al., Nature, 502:462-71 (2013)). A single cell type may have hundreds of thousands of DHS and there is a considerable degree of cell type specificity in the location of these DHS.
Previous efforts to reveal chromatin structure (or chromatin conformation) based on different properties of chromatin have been performed on relatively large populations of cells (Auerbach, et al., PNAS, 106:14926-14931 (2009); Henikoff, et al., Genome Research, 19:460-469 (2009)). The data was obtained by averaging sites heterogeneously distributed in many different cells, which confounds interpretation of the results.
Nuclease resistant DNA sequences, referred to herein as DHRS (DNase I Hyper-Resistant Sites), reflect chromatin maintained in an inactive state. Individual segments of condensed DNA have been isolated and characterized physically (Wang, et al., The Journal of Biological Chemistry, 279:55401-55410 (2004)). DHRS may be involved in active processes for suppressing a gene (Stauffer, et al., J Cell Sci, 114:2383-2393 (2001); Martin, et al., FASEB, J 24:1066-1072 (2010); Burgess-Beusse, et al., PNAS, 99(4):16433-16437 (2002)). Some DHRSes overlap sites of CpG hypermethylation and gene silencing, although DNA methylation in the body of a gene may be associated with active expression rather than silencing (Prioleau, et al., The EMBO Journal, 18:4035-4048 (1999), Costello, et al., Nat Genet, 24:132-138 (2000), Kashiwagi, et al., Nucleic Acids Research, 39:874-888 (2011), Jursch, et al., Mob DNA, 4:15 (2013)).
Not all parts of the genome can be simply categorized as DHS or DHRS. More specifically, DHRSes are not just an absence of DNase I hypersensitivity, but are sites of DNase I hyper-resistance that exhibit specific characteristics. Accordingly, previous efforts to map chromatin sites based on DHS analysis alone are generally insufficient. No genome wide-based high resolution study of the distribution of condensed nuclease resistant chromatin regions has been reported, and the direct study of the genomic distribution of compacted chromatin is a relatively unexplored field. Therefore, there remains a need for improved methods of identifying site of closed chromatin and DHRS.
Methylation
Methylation of cytosine in CpG sequences is an important epigenomic modification, which is involved in regulating many cellular processes (Jones, et al., Science, 293: 1068-1070 (2001)). The promoters of more than half of all genes are embedded in CpG islands, and methylation of the islands correlates strongly with gene silencing. Aberrant methylation has been shown to correlate with a number of disease processes affecting embryonic and later development. Examples include uniparental disomy for chromosomes 6 and 7 (Russell Silver syndrome), chromosome 11 (Beckwith-Wiedemann syndrome), chromosome 14, 15 (Prader-Willi and Angelman syndromes (Schimmenti, et al., Genetics in Medicine, 13: 1006-1010 (2011)), chromosome 16, and 20 (Eroglu, et al., Seminars in reproductive medicine, 30: 92-104 (2012); Binder, et al., Clinical endocrinology & metabolism, 25(1):153-60 (2011); Moreira-Pinto, et al., Fetal Pediatr Pathol., 31(6):448-52 (2012)). Methylation screening in newborns may also detect environmental exposure of the fetus in utero to harmful elements such as smoking, stress, and toxic chemicals (arsenic, polycyclic aromatic hydrocarbons).
Abnormal methylation is a marker for mutations that silence genes. Trinucleotide expansions, which are not well detected by short-read, high-throughput sequencing often result in gene silencing through promoter methylation. For example, examining the CpG islands of the Fragile X gene and others may be an alternative method of identifying this type of mutation (Sheridan, et al., PLoS One, 6(10):e26203 (2011)). As an exploratory study, cataloguing global methylation in phenotypically characterized newborns could identify aberrant patterns that reflect additional genetic or epigenetic disorders currently unrecognized.
Several methods have been applied to analysis of global cytosine methylation in the human genome. Methylation-sensitive restriction enzymes (MSREs) have been used to map the methylation status of an informative subset of CpG cluster (Estecio, et al., Genome Res, 17, 1529-1536 (2007); Shann, et al., Genome Res, 18, 791-801 (2008)). DNA immunoprecipitation with methyl C binding proteins (MceP2 or MBD) (Fuks, et al., The Journal of Biological Chemistry, 278, 4035-4040 (2003); Kangaspeska, et al., Nature, 452, 112-115 (2008)), and antibody capture of the methylated-C containing DNA fragments or methylated DNA immunoprecipitation (MeDIP) (Weber, Nature Genetics, 39: 457-466 (2007); Koga, et al., Genome Res, 19, 1462-1470 (2009)) have also been widely applied. Other studies utilizing MeDIP (Pelizzola, et al., Genome Res, 18, 1652-1659 (2008)), MSRE (Yasukochi, Y. 2010, PNAS, 107, 3704-3709) and MBD to analyze CpG methylation patterns indicate that none of these methods confidently determines if a given CpG site is methylated or not. Furthermore, each of these methods requires relatively large amounts of DNA.
A popular method for genome wide DNA methylation (methylC) analysis is to deaminate unmethylated cytosines, then compare the DNA sequence with that of the untreated DNA, which is achieved by using bisulfite treatment and sequencing. Genome wide methylC-seq covers all Cs in a genome but requires several lanes on HiSeq2000 to evaluate one sample with sufficient depth. It is not financially practical as a clinical test using current technology.
Alternatively, reduced representative bisulfate sequencing (RRBS) detects most of the CpGs in the CpG islands and promoters with a cost of about 2% of full methylC-seq (Gu, et al., Nat Protoc, 6, 468-48 (2011)). The drawback is that conventional RRBS, like methyl-seq, requires not only a high quantity but also high quality genomic DNA. Deamination must be done on the input DNA rather than on amplified samples, so as not to lose methylation marks during amplification. This procedure involves too many steps with too much potential for DNA loss to be applicable to single cells using current methodology.
Because conventional methods rely on large quantities of genomic DNA, genomic distribution of DNA CpG methylation most typically relies on pooled DNA from many cells. Studies indicate that dramatic changes in DNA methylation occur during germ cell formation and early development of the fertilized egg (Dobbs, et al., PloS one 8, e66230 (2013), Smith, et al., Nature, 484:339-344 (2012)). Differences in methylation patterns of somatic tissues are more restricted (Chen, et al., The Journal of Biological Chemistry, 286:18347-18353 (2011)). Methylation also increases in aging hematopoietic stem cells, and may contribute to the aging phenotype (Bocker, et al., Blood 117, e182-189 (2011), Hodges, et al., Molecular Cell, 44:17-28 (2011), Hogart, et al., Genome Research, 22:1407-1418 (2012), Beerman, et al., Cell Stem Cell, 12:413-425 (2013)).
However, most of this information is derived from tissues or organs that are composed of a mixture of a variety of cell types. Even when cell lines are examined, it is unusual to separate cells according to the stage of the cell cycle or to take account of potential circadian effects on gene expression. Therefore, the results of the studies are most typically are actually an average of values for a large, heterogeneous cell population, and may not accurately reflect the state of any homogeneous subpopulation or individual single cells. This is especially true for histone modification studies including ChIP-seq or DHS studies that usually require millions of cells. The most sensitive protocol for ChIP-seq (Adli, et al., Nature Methods, 7:615-618 (2010), Adli, et al., Nature Protocols, 6:1656-1668 (2011)) needs no less than 10,000 cells, and has not yet been widely applied.
Recently, efforts have been focused on global transcription analyses of single cells (Tang, et al., Nature Protocols, 5:516-535 (2010); Islam, et al., Genome Research, 21:1160-1167 (2011); Hashimshony, et al., Cell Rep, 2:666-673 (2012); Yan, et al., Nature Structural & Molecular Biology, 20:1131-1139 (2013); Farlik, et al. Cell Reports, 10(8):1386-97 (2015). These methods have confirmed heterogeneity in the types of cells present in what had been previously presumed to be relatively homogeneous cell preparations (Sasagawa, et al., Genome Biol, 14:R31 (2013)), in the distribution of splice isoforms among cells (Shalek, et al., Nature, 498:236-240 (2013)), and in the response of cells to various stimuli. In some cases, such as hematopoietic multipotential precursors, the heterogeneity is remarkably extensive, requiring a new level of description for lineage differentiation (Gibbs, et al., Blood, 117:4226-4233 (2011), Mills, et al., Blood, 122:2047-2051 (2013)). Despite recent efforts, there remains a lack of suitable methods for determining single cell level DNA methylation (Guo, et al., Genome Res., 23(12):2126-35 (2013)).
Accordingly, improved methods for analyzing chromatin architecture and methylation status, particular in small quantities of cells and in single cells are needed.
Therefore, it is an object of the invention to provide sensitive methods for identifying sites of closed chromatin and/or DNase I Hyper-Resistant Sites (DHRS) in the genome of cells.
It is also an object of the invention to provide sensitive methods for determining if CpG-rich regions such as CpG islands and CpG island shores in the genome of cells are methylated or unmethylated.
It is also an object of the invention to provide methods for identifying differentially methylated regions (DMR) and determining if they are methylated and unmethylated.
It is a further object of the invention to provide methods for improved method of sequencing DNA at single nucleotide resolution after bisulfite conversion.
It is a further object of the invention to provide methods for analysis of chromatin and methylation status that are suitable for use on limited genomic DNA, for example DNA from a few or even a single cell.
It is another objection of the invention to provide methods for reducing or preventing random or non-specific strand breakage or damage or loss of genomic DNA that can occur when genomic DNA is isolated, or accessed or processed from cells, particularly small quantities of cells.
It is a further object to the invention to employ the improved methods of isolating, accessing, and or processing genomic DNA in methods that include amplifying genomic sequences.
It is further object of the invention to provide methods that can be carried out partially or completely in a single tube.