The invention relates generally to methods of whole genome analysis, and more particularly relates to methods for ascertaining information about the nucleotide composition of a region of a DNA molecule bounded by a nick site and a termination site. The ascertained nucleotide composition can be compared to nucleotide composition data from a reference genome so that differences, if any, from a comparable region of the reference genome can be identified without the need to re-sequence the region.
The human genome harbors many types of genetic aberrations, such as polymorphisms expressed at a single nucleotide and structural level, as well as large-scale events, particularly those associated with cancer. Some aberrations, such as single nucleotide polymorphisms (SNPs), involve only a single nucleotide and may fall within exons or introns of genes, or within the heterochromatic regions between genes. SNPs within a coding sequence do not necessarily change the amino acid sequence of a protein encoded by a gene, due to degeneracy of the genetic code. However, because SNPs are mutations in DNA, they serve as markers for disease and for nucleotide position. In addition, SNPs located in heterochromatic regions between genes may still have consequences for gene splicing or transcription factor binding.
Other genetic aberrations involve more than single nucleotides and can therefore affect intermediate structure of DNA. Examples of these genetic aberrations include, but are not limited to, amplifications, insertions, deletions, inversions and rearrangements. Typically, these aberrations are analyzed with classical cytogenetics and, more recently, using data acquired from high-density, oligonucleotide arrays.
However, there is a continuing need for rapid, simple, comprehensive and cost-effective methods of whole genome analysis that leverage, or complement, emerging sequencing systems.