Studies on mammalian nuclear architecture aim to understand how 2 meters of DNA is folded into a nucleus of 10 μm across, while allowing accurate expression of the genes that specify the cell-type, and how this is faithfully propagated during each cell cycle. Progress in this field has largely come from microscopy studies, which revealed that genomes are non-randomly arranged in the nuclear space. For example, densely packed heterochromatin is separated from more open euchromatin and chromosomes occupy distinct territories in the nuclear space 2. An intricate relationship exists between nuclear positioning and transcriptional activity. Although transcription occurs throughout the nuclear interior, active genes that cluster on chromosomes preferentially locate at the edge or outside of their chromosome territory. Individual genes may migrate upon changes in their transcription status, as measured against relatively large nuclear landmarks such as chromosome territories, centromeres or the nuclear periphery. Moreover, actively transcribed genes tens of megabases apart on the chromosome can come together in the nucleus, as demonstrated recently by fluorescence in situ hybridization (FISH) for the β-globin locus and a few, selected, other genes. Besides transcription, genomic organisation is associated with the coordination of replication, recombination and the probability of loci to translocate (which can lead to malignancies) and the setting and resetting of epigenetic programs. Based on these observations it is thought that the architectural organisation of DNA in the cell nucleus is a key contributor to genomic function.
Different assays have been developed to allow an insight into the spatial organisation of genomic loci in vivo. One assay, called RNA-TRAP has been developed (Carter et al. (2002) Nat. Genet. 32, 623) which involves targeting of horseradish peroxidase (HRP) to nascent RNA transcripts, followed by quantitation of HRP-catalysed biotin deposition on chromatin nearby.
Another assay that has been developed is called chromosome conformation capture (3C) technology, which provides a tool to study the structural organisation of a genomic region. 3C technology involves quantitative PCR-analysis of cross-linking frequencies between two given DNA restriction fragments, which gives a measure of their proximity in the nuclear space (see FIG. 1). Originally developed to analyse the conformation of chromosomes in yeast (Dekker et al., 2002), this technology has been adapted to investigate the relationship between gene expression and chromatin folding at intricate mammalian gene clusters (see, for example, Tolhuis et al., 2002; Palstra et al., 2003; and Drissen et al., 2004). Briefly, 3C technology involves in vivo formaldehyde cross-linking of cells and nuclear digestion of chromatin with a restriction enzyme, followed by ligation of DNA fragments that were cross-linked into one complex. Ligation products are then quantified by PCR. The PCR amplification step requires the knowledge of the sequence information for each of the DNA fragments that are to be amplified. Thus, 3C technology provides a measure of interaction frequencies between selected DNA fragments.
3C technology has been developed to identify interacting elements between selected parts of the genome and both techniques require the design of primers for all restriction fragments analysed. Recently, new strategies have been developed that allow screening the entire genome in an unbiased manner for DNA segments that physically interact with a DNA fragment of choice. They are based on 3C technology and are collectively referred to as ‘4C technology’. 4C technology allows the screening of the entire genome in an unbiased manner for DNA segments that physically interact with a DNA fragment of choice. 4C technology depends on the selective ligation of cross-linked DNA fragments to a restriction fragment of choice (the ‘bait’). In 4C technology, all the DNA fragments captured by the bait in the population of cells are simultaneously amplified via inverse PCR, using two bait-specific primers that amplify from circularized ligation products.
Essentially two strategies can be pursued to obtain these DNA circles. One strategy relies on the formation of circles during the standard 3C ligation step, i.e. while the DNA is still cross-linked (Zhao et al. (2006) Nat Genet 38, 1341-7). Here, circle formation requires both ends of the bait fragment to be ligated to both ends of a captured restriction fragment. If multiple restriction fragments are cross-linked together, circles may still be formed but they can contain more than one captured fragment and will therefore be larger. After de-crosslinking, captured DNA fragments are directly amplified by inverse PCR, using bait-specific primers facing outwards. Restriction enzymes recognizing four or six basepairs can be used in this set up. Four-cutters are preferred in this method though, since they produce smaller restriction fragments (average size 256 bp, versus ˜4 kb for six-cutters) and linear PCR amplification of the captured DNA fragments requires that the average product size is small. Essentially, this method therefore comprises the steps of: (a) providing a sample of cross-linked DNA; (b) digesting the cross-linked DNA with a primary restriction enzyme—such as a 4 bp or a 5 bp cutter; (c) ligating the cross-linked nucleotide sequences; (d) reversing the cross linking and (e) amplifying the one or more nucleotide sequences of interest using at least two oligonucleotide primers, wherein each primer hybridises to the DNA sequences that flank the nucleotide sequences of interest. The amplified sequence(s) can be hybridised to an array in order to assist in determining the frequency of interaction between the DNA sequences.
The second strategy advantageously relies on the formation of DNA circles after the chromatin has been de-cross-linked as is described herein and in our co-pending application WO2007/004057. As described therein, 4C technology allows an unbiased genome-wide search for DNA fragments that interact with a locus of choice. Briefly, 3C analysis is performed as usual, but omitting the PCR step. The 3C template contains a target sequence or ‘bait’ (eg. a restriction fragment of choice that encompasses a selected gene) ligated to many different nucleotide sequences of interest (representing this gene's genomic environment). The template is cleaved by another, secondary, restriction enzyme and subsequently religated to form small DNA circles. Advantageously, the one or more nucleotide sequences of interest that are ligated to the target nucleotide sequence are amplified using at least two oligonucleotide primers, wherein at least one primer hybridises to the target sequence. The second primer preferably also hybridises to the target sequence, such that both primers flank the nucleotide of interest. Alternatively, the second primer hybridises to an adapter sequence that is ligated to the secondary restriction site, such that the two primers flank the nucleotide of interest. Typically, this yields a pattern of PCR fragments that is highly reproducible between independent amplification reactions and specific for a given tissue. HindIII and DpnII may be used as primary and secondary restriction enzymes. Next, the amplified fragments may be labeled and optionally hybridised to an array, typically against a control sample containing genomic DNA digested with the same combination of restriction enzymes. 3C technology has therefore been modified such that all nucleotide sequences of interest that interact with a target nucleotide sequence are amplified. Practically this means that instead of performing an amplification reaction with primers that are specific for the fragments that one wishes to analyse, an amplification is performed using oligonucleotide primer(s) which hybridise to a DNA sequence that flanks the nucleotide sequences of interest. Advantageously, 4C is not biased towards the design of PCR primers that are included in the PCR amplification step and can therefore be used to search the complete genome for interacting DNA elements.
There is an important need for high-throughput technology that can systematically screen the whole genome in an unbiased manner for DNA loci that contact each other in the nuclear space.
Moreover, there is a need for improvements in such technologies which permit the simultaneous analysis of multiple interactions occurring with multiple sequences in the genome, and for analysing the genome for insertions, deletions, translocations, inversions and rearrangements which take place at unknown locations and which may be associated with a disease.
The present invention seeks to provide improvements in 3C and 4C technology and techniques related thereto.