The draft sequence of the human genome was published in 2001 by the Human Genome Consortium (Nature 409; issue 6822) and Celera genomics (Science, 291; 1304-1351), thus marking an important advancement to the genetics chapter for society. Capitalizing on this investment and realizing the potential of the Human Genome Project provides a better understanding of genetic variation and its effect in disease.
It has been estimated that any two copies of the human genome differ from one another by as little as 0.1%, in other words a total of three million variants, or one variant every 1000 bases, over a total of three billion that make up the human genome. Since such variation affects disease susceptibility and responses to drugs, it is advantageous to identify the genetic factors which contribute to biological variation. DNA sequencing is a fundamental tool enabling the screening of genes for such genetic mutations associated with disease. High throughput, high accuracy sequencing methods are therefore beneficial for screening the complete genome sequence of an animal in order to identify unique nucleic acid sequences which may indicate the presence of physiological or pathological conditions.
DNA sequencing of large and complex genomes is currently limited by cost. In order to accurately sequence a human genome to a depth of 15× coverage requires the generation of at least 45 billion bases of sequence. Even for highly parallel sequencing technologies with read lengths of hundreds of base pairs, many hundreds of millions of sequencing reads are typically obtained in parallel. These reads may be recorded on microscopy based platforms and may therefore involve the consecutive capturing of many thousands of images on an imaging device such a CCD camera with a finite number of pixels. In order to maximize the rate of output of sequencing information, efforts have been made to increase the ratio of bases sequenced per image (i.e., the ratio of bases/pixels). In general, array techniques that rely on the random distribution of features can suffer from a low ratio of bases/pixels, due to a high number of dark pixels with no features (for example, if the density of features is too diffuse), or a high number of pixels that carry multiple overlapping features of different sequence (if the density of features is too concentrated) or both (due to the random nature of feature placement). A more efficient use of the imaging pixels can be made if the features on the surface are tightly packed, non-overlapping and of similar size and intensity to each other.
The present invention provides methods of fabricating arrays of features that avoid low ratios of bases/pixels associated with many array fabrication methods while exploiting advantages of random feature fabrication. Thus, the presently disclosed invention embodiments provide, for example, ease of array fabrication, low cost of array fabrication, an increase in the amount of data generated using any of a variety of high throughput imaging platforms, and other related advantages.