With the completion of the sequencing of the human genome the scientific community is in a position to begin studying the relationship between genetics and disease in earnest. Several groups have already embarked on the first stage of such studies, a comprehensive mapping of the SNPs and genetic markers in the human genome (Collins et al, Nature (2003) 422:835-847). Using this information, a genome-wide scan of SNPs of a population can establish potentially interesting regions of the genome associated with a particular disease (Botstein et al, Nat Genet (2003) 33:228-237). However, in such results, there will be a high incidence of false positives due to the multiple-testing problem, and a high incidence of false negatives, due to weak correlation between single SNPs and a given disease. Therefore, after this stage, it becomes necessary to reexamine the identified regions with a finer-grained mapping of genetic features to confirm the previously established relationships. In these studies, the power to detect correlations is greatly increased by comparing haplotypes of the case studies, rather than just studying SNPs (Douglas et al, Nat Genet (2001) 28:361-364).
Determination of genetic haplotypes is difficult in heterozygous diploid organisms. The technologies currently in broad use for sequencing studies are based on bulk studies of PCR products. Since these technologies genotype products which are derived from a combination of both chromosomes, they cannot distinguish SNPs which are different on different chromosomes; hence the individual's haplotype cannot be resolved at loci where the subject is heterozygous.
Some groups have circumvented this problem by physically separating the chromosomes prior to PCR (Patil et al, Science (2001) 294:1719-1723; Douglas et al, 2001) by using allele-specific PCR to amplify only one of the parent chromosomes in a heterozygous individual (Michalatos-Beloin et al, Nucl. Acids. Res. (1996) 24:4841-4843), or by single-molecule PCR (Ding and Cantor, Proc Natl Acad Sci USA (2003) 100:7449-7453). However, these cloning techniques are laborious and the PCR-based methods can only amplify short DNA fragments, which can limit their application to high-throughput haplotyping methods. Another group has shown that labeled single DNA molecules can be imaged by atomic force microscopy (AFM) (Woolley et al, Nat Biotechnol (2000) 18:760-763), but this approach requires sophisticated and expensive instrumentation not readily available to most laboratories. Other investigators have analyzed individual, allele-specifically labeled polynucleotides using capillary flow past fluorescence detectors (Goodwin et al, Curr Pharm Biotechnol (2004) 5:271-278), however haplotypes defined by more than two SNPs must be identified by the repeated typing of pairs of SNPs.
Over the past decade, technological advances have allowed biophysicists to study biological systems on a molecule-by-molecule basis, giving them the unprecedented capacity to resolve properties of complex systems that are obscured by measuring properties which are averaged over the entire ensemble. One approach to measuring properties of single molecules is through fluorescence. For instance, we have recently demonstrated the ability to localize single fluorescent molecules with very high accuracy (approximately 1.5 nm) with half-second time resolution over the course of several minutes. We refer to this technique as Fluorescence Imaging with One Nanometer Accuracy (FIONA), and have used it to investigate the processive walking of the myosin V (Yildiz et al, Science (2003) 300:2061-2065) and kinesin (Yildiz, et al. (2004) Science 303, 676-678) molecular motors labeled with Cy3. We have also shown that we can achieve similar results with a variety of different types of dyes (Snyder et al, Biophys J. (2004) 87:1776-1783; Park, H., Hanson, G., Duff, S. & Selvin, P. (2004) Journal of Microscopy) on both proteins and DNA, making FIONA a highly versatile technique.
The image of a single fluorescent molecule (often called its “point spread function”, or PSF) will have a width (w), dictated by the Rayleigh diffraction limit, of λ/(2×NA), where λ is the wavelength of the emitted light, and NA is the numerical aperture of the optical system. However, the centroid of the PSF can be determined much more accurately than this. This is actually quite an intuitive result: the position of the peak of a mountain can be determined with great precision, and with an accuracy much smaller than the width. In fluorescence imaging, the centroid of the PSF can be determined to within approximately w/√{square root over (N)}, where N is the number of collected photons (Thompson, et al. (2002) Biophys. J. 82, 2775-2783).
In FIONA, a molecule labeled with a single dye is illuminated using total internal reflection microscopy (Axelrod, D. (1989) Methods Cell Biol 30, 245-70), and the photons emitted by the dye molecule are collected by a high-numerical aperture oil objective, and imaged using a high-speed back-thinned cooled CCD camera. The images are captured with no dead time between images, creating a continuous “movie” of the molecule. Each image frame is then fit to a Gaussian distribution to determine the position of the centroid. By actively deoxygenating solutions using a glucose oxidase/catalase “cocktail” and by suppressing dye blinking with appropriate buffer conditions, we can collect approximately 10,000 photons in one half second integration from a single molecule. For red light (λ≈150 nm), this means we can localize molecules to within approximately 1.5 nm, as stated above.
By taking advantage of the time resolution available to FIONA and the quantal photobleaching of single dye molecules, we have shown the ability to resolve distances between single dye molecules of the same color down to 10 nm (Gordon, M. P., Ha, T. & Selvin, P. R. (2004) PNAS 101, 6462-6465). FIONA can also be used to distinguish single molecules of different colors by accurate determination of their PSF width, which should be proportional to their wavelength, according to the Rayleigh diffraction limit discussed above.
We disclose a cost-effective, high-throughput system for haplotyping based on single-molecule technologies in which isolated, individual polynucleotide molecules from diploid organisms are labeled allele-specifically with target-specific hybridization probes. Individual labels at each target allele are optically detected, and a barcode representation of the polynucleotide is formed where the alleles and their relative positions are represented. Barcoding polynucleotides according to our invention facilitates a variety of analyses, including genotyping, sequencing and haplotyping.