In the recent decades, micro- and nanofluidic devices and methods have been developed for integrating, miniaturising and automating numerous laboratory tasks. Furthermore, due to the characteristic length scales involved, analysis tasks not previously available have been made possible.
Considerable efforts have been directed to providing a reliable, rapid and affordable analysis of very long macromolecules, such as single DNA molecules, including amplification and/or sequencing steps. However, the maximum fragment lengths analysed by existing methods, are typically limited to about 35-1000 base pairs as compared to the length of bacterial DNA of about 1-10 million base pairs, and at least 50 million base pairs of a complete human DNA molecule.
A recent article on “Single molecule linear analysis of DNA in nano-channel labelled with sequence specific fluorescent probes”, published in 2010 in Nucleic Acids Research by S. K. Das et al., discloses a nanofluidic method of analysing DNA molecules. The DNA molecules analysed have all a length less than 200 kilo base pairs. A further article on nanofluidic analysis of DNA by Reisner and co-workers, published in Proceedings of the National Academy of Sciences of the USA, vol. 107, p. 13294, 2010, discloses a method for analysing DNA applicable to long DNA molecules, where “long” refers to a length of about 100 kilo base pairs.
It is one of the merits of the present invention to recognise, that the main factor limiting the length of the fragments analysed in a nanofluidic system lies in the sample preparation and transfer steps. As a principal reason for this limitation, the fragility of isolated DNA molecules or similar long macromolecules, due to shearing forces acting on the DNA, has been identified. According to the present invention, the sample preparation and transfer steps are thus identified to be critical for increasing the fragment length that can be analysed in micro- and nanofluidic systems, and eventually being able to process, e.g. for sequencing or amplification, a complete isolated DNA molecule or similar long macromolecule.
A chromosome prior to replication comprises a single length of DNA. The ability to visualize the DNA from each chromosome, from one end to the other, would enable the native long-range organization of the genome and its variation between homologous chromosomes and between individuals to be investigated. Entropic confinement in nano-channels/grooves, as demonstrated for bacteriophage genomes (<200 Kbp length), forces DNA into an extended conformation co-linear with the information encoded therein. However, to linearize whole large genomes (e.g. Human), direct from source without cloning, two problems must be considered: Firstly, during extraction or loading into a device, genomic DNA can become fragmented due to shear forces, and secondly, DNA tends to form folded, globular states in solution rather than the extended conformation. Although methods for mapping sequence motifs and patterns (Neely et al Chem. Sci. 2010; Xiao et al Nucleic Acids Res. 35: e16 2007; Reisner et al PNAS107(30):13294-9) on linearized DNA have been developed, new approaches are needed to handle, if not whole chromosomal lengths of human DNA, then portions of chromosomes that are large enough to span the haplotype blocks and much of the structural variation found in large diploid genomes.
Moreover genome analysis methods with minimal sample preparation are needed. Direct single molecule analysis of genomic DNA can achieve this; recently a whole genome has been sequenced using single molecule technology (Pushkarov et al, Nature Biotechnology 27: 847). Even so, in current methods, DNA extraction is done off-chip and the DNA handling (e.g. pipetting) leads to reduction in size of the genome fragments due to fragmentation by shearing.
There is a pronounced need for single molecule analysis of long macromolecules. For example, there are an estimated 200 cell types in the human body. However, all cells within a seemingly homogeneous population of a given a cell type are not necessarily alike. Stochastic expression at the gene and protein level is well documented. Stochastic effects lead to widely differing responses to stimuli: fast, slow, extreme or subdued. Ensemble analysis of cell populations masks the variation that is clearly evident when individual cells from a population are analysed.
There is substantial heterogeneity between cells in a tumour biopsy, including differences in chromosome number (aneuploidy), mutational profiles, methylation profiles and expression at the RNA and protein level. Analysis of single cells within tumours is important for understanding tumour pathology and is expected to contribute to cancer diagnosis, staging, and prognosis. Biopsies may contain on the order of 10,000 cells. Systematic, high throughput and preferably automatable analysis is therefore needed to address the population cell by cell.
In addition single cell analysis is important for genetic diagnosis, particularly for pre-implantation genetic analysis, which in the future may require analysis of more than one or a few genes, as the scientific community makes increasingly more connections between genotype and phenotype.
In many cases sample material is limiting, for example from archived material or for the analysis of fetal material in a mothers circulating blood or shed tumour cells or metastatic cells in circulating blood. In these cases better methods are needed for analysis of single or a few cells or a small amount of material. In the case of analysis of material in circulating blood the task may be compared to finding a needle in a haystack because the target material is a small fraction of a complex sample.
The genome and its epigenetic modifications can be analysed by modern genomic methods, the most comprehensive approach being complete genome sequencing. However, despite the emergence of technologies that have increased throughput and spectacularly lowered sequencing cost, a number of bottlenecks remain that serve as barriers to the effective translation of genomic knowledge. Although much attention has been given to throughput/cost of the sequencing process itself, the same cannot be said of preparation of the sample for sequencing. A first bottleneck is that sequencing technologies require days of upfront sample preparation. A second bottleneck is that upfront sample processing is further increased when goal is to sequence selected parts of the genome. A third bottleneck arises because all the existing technologies produce short sequence reads and thus genome assembly relies on comparing reads to the reference genome. But since the reference sequence is a composite of several genomes, such comparisons do not reveal the phenotypically significant structural variation that exists between individual genomes (rearrangements, copy number, translocations, inversions).
As mentioned above, it is one of the merits of the present invention to recognise, that the main factor limiting the length of the fragments analysed in a micro- and/or nanofluidic system lies in the sample preparation and transfer steps.
With this insight in mind, the object of the present invention is providing an improved technique for preparing long macromolecules for subsequent processing in a micro- and/or nanofluidic device overcoming the problems of the prior art or at least providing an alternative.