Since the NIH project to initiate the sequencing of the whole human genome at the end of the 1990's, sequencing technology has evolved rapidly. Especially since the introduction of 2nd generation of sequencing machines in 2005 the costs of sequencing have been reduced by a factor 10 to around 1 million US $ per human genome at the beginning of 2008. The sequencing industry is now aiming at reducing the costs of DNA sequencing even further with the aim of reaching costs of around 1000 US $ per human genome in the near future. Based on these prospects and expectations, DNA sequencing, in particular the sequencing of genomic DNA, will become a crucial clinical and diagnostic tool, which may be employed for the analysis of genetic variations, the detection of diseases or the elucidation of a predisposition for diseases, in particular for the diagnosis of cancer or the detection of a inclination to develop cancer. The key application of clinical DNA sequencing will, however, not be the sequencing of whole genomes, but rather the re-sequencing of relevant genomic portions or genes known to be involved in the etiology of diseases.
A prerequisite for such an approach is the efficient isolation of target DNA to be sequenced. Typically, complex eukaryotic genomes like the human genome, are too large to be explored without complexity reduction based, e.g., on the direct amplification of specific sequences by PCR methods including short PCR and long PCR, or via fosmid library construction, BAC library construction, TAR cloning or by employing selector technology.
An alternative to the mentioned procedures for reducing the complexity of genomic DNA constitutes the microarray-based genomic selection (MGS), which has been developed to isolate user-defined unique genomic sequences from complex eukaryotic genomes (WO 2008/097887). This method encompasses physical shearing of genomic DNA to create random fragments of an average size of around 300 bp, an end-repairing of the fragments, a ligation to unique adaptors with complementary T nucleotide overhangs and the hybridization of the fragments to a high-density oligonucleotid microarray of complementary sequences indentified from a reference genome sequence, the subsequent elution of the fragments and their amplification via PCR using the adaptor sequences (WO 2008/097887).
However, with current MGS protocols only about 80-90% of the target regions can be recovered. Thus, 10% to 20% of the target sequences are missing and several other regions may be covered only at a low level, which may impede a reliable discovery of mutations in re-sequencing approaches. A hitherto unrecognized problem, which may explain the encountered difficulties to recover the target regions quantitatively is the fact that in typical MGS hybridization mixtures both complementary strands of the genomic DNA are present with high copy numbers, favoring a back-hybridization to the complementary strand instead of a binding to the capture probes.
There is, hence a need for an improved enrichment method for target molecules, in particular target DNA molecules such as genomic nucleic acids, which allows an efficient, reliable and quantitative recovery of the target molecules.