The advent of nucleic acid microarray technology makes it possible to build an array of millions of nucleic acid sequences in a very small area, for example on a microscope slide (e.g., U.S. Pat. Nos. 6,375,903 and 5,143,854). Initially, such arrays were created by spotting pre-synthesized DNA sequences onto slides. However, the construction of maskless array synthesizers (MAS) as described in U.S. Pat. No. 6,375,903 now allows for the in situ synthesis of oligonucleotide sequences directly on the slide itself.
Using a MAS instrument, the selection of oligonucleotide sequences to be constructed on the microarray is under software control such that it is now possible to create individually customized arrays based on the particular needs of an investigator. In general, MAS-based oligonucleotide microarray synthesis technology allows for the parallel synthesis of over 4 million unique oligonucleotide features in a very small area of a standard microscope slide. With the availability of the entire genomes of hundreds of organisms, for which a reference sequence has generally been deposited into a public database, microarrays have been used to perform sequence analysis on nucleic acids isolated from a myriad of organisms.
Nucleic acid microarray technology has been applied to many areas of research and diagnostics, such as gene expression and discovery, mutation detection, allelic and evolutionary sequence comparison, genome mapping, drug discovery, and more. Many applications require searching for genetic variants and mutations across the entire human genome; variants and mutations that, for example, may underlie human diseases. In the case of complex diseases, these searches generally result in a single nucleotide polymorphism (SNP) or set of SNPs associated with one or more diseases. Identifying such SNPs has proven to be an arduous, time consuming, and costly task wherein resequencing large regions of genomic DNA, usually greater than 100 kilobases (Kb) from affected individuals and/or tissue samples is frequently required to find a single base change or identify all sequence variants.
The genome is typically too complex to be studied as a whole, and techniques must be used to reduce the complexity of the genome. To address this problem, one solution is to reduce certain types of abundant sequences from a DNA sample, as found in U.S. Pat. No. 6,013,440. Alternatives employ methods and compositions for enriching genomic sequences as described, for example, in Albert et al. (2007, Nat. Meth., 4:903-5, Epub 2007 Oct. 14) and Okou et al. (2007, Nat. Meth. 4:907-9, Epub 2007 Oct. 14). Albert et al. disclose an alternative that is both cost-effective and rapid in effectively reducing the complexity of a genomic sample in a user defined way to allow for further processing and analysis.
However, it is equally important to be able to enrich target sequences uniformly over the targeted region(s). If enrichment is not uniform, for example, some target sequences will be captured disproportionately compared to other target sequences thereby negating downstream applications that are dependent on approximately uniform distribution of targeted sequences. Hodges et al. (2007, Nat. Meth. 39:1522-1527, Epub 2007 Nov. 4) noted that a critical parameter in microarray capture was the introduction of biased target capture which greatly affects sequence coverage depth. However, Hodges offered no path forward, other than to say that probe redistribution to compensate for biased capture would necessarily introduce other types of biases that would lead to problems with downstream applications, for example sequencing applications.
As such, what are needed are methods and compositions to provide uniform capture, and hence representation, of captured targets during capture and enrichment of targeted sequences in a microarray format. Conversely, an investigator might also require a conscience non-uniformity of capture, for example if an investigator envisions targeting exons over intergenic regions. Such methods would provide maximum data utility to investigators in their endeavors to understand and identify, for example, causes of disease and associated therapeutic treatments.