1. Field of the Disclosure
The present disclosure relates to a high throughput automated single molecule image collection and processing system that requires minimal or limited initial user input. Optical images of single molecules and fragments elongated and fixed within microfluidic channels can be automatically collected, maintaining correct focus, and the images prepared for further data processing. A computer-based analysis can be performed on each image thereby obviating the problem of uneven illumination in fluorescence microscopy, and providing an overall robust imaging operation. Embodiments described herein are thus useful in studies of any macromolecules such as DNA, RNA and proteins.
2. Description of the Related Art
Modern biology, particularly molecular biology, has focused itself in large part on understanding the structure, function, and interactions of essential macromolecules in living organisms such as nucleic acids and proteins. For decades, researchers have developed effective techniques, experimental protocols, and in vitro, in vivo, or in situ models to study these molecules. Knowledge has been accumulating relating to the physical and chemical traits of proteins and nucleic acids, their primary, secondary, and tertiary structures, their roles in various biochemical reactions or metabolic and regulatory pathways, the antagonistic or synergistic interactions among them, and the on and off controls as well as up and down regulations placed upon them in the intercellular environment. The advance in new technologies and the emergence of interdisciplinary sciences in recent years offer new approaches and additional tools for researchers to uncover unknowns in the mechanisms of nucleic acid and protein functions.
The evolving fields of genomics and proteomics are only two examples of such new fields that provide insight into the studies of biomolecules such as DNA, RNA and protein. New technology platforms such as DNA microarrays and protein chips and new modeling paradigms such as computer simulations also promise to be effective in elucidating protein, DNA and RNA characteristics and functions. Single molecule optical mapping is another such effective approach for close and direct analysis of single molecules. See, U.S. Pat. No. 6,294,136, the disclosure of which is fully incorporated herein by reference. The data generated from these studies—e.g., by manipulating and observing single molecules—constitutes single molecule data. The single molecule data thus comprise, among other things, single molecule images, physical characteristics such as the length, shape and sequence, and restriction maps of single molecules. Single molecule data provide new insights into the structure and function of genomes and their constitutive functional units.
Images of single molecules represent a primary part of single molecule datasets. These images are rich with information regarding the identity and structure of biological matter at the single molecule level. It is however a challenge to devise practical ways to extract meaningful data from large datasets of molecular images. Bulk samples have conventionally been analyzed by simple averaging, dispensing with rigorous statistical analysis. However, proper statistical analysis, necessary for the accurate assessment of physical, chemical and biochemical quantities, requires larger datasets, and it has remained intrinsically difficult to generate these datasets in single molecule studies due to image analysis and file management issues. To fully benefit from the usefulness of the single molecule data in studying nucleic acids and proteins, it is essential to meaningfully process these images and derive quality image data.
Effective methods and systems are thus needed to accurately extract information from molecules and their structures using image data. For example, a large number of images may be acquired in the course of a typical optical mapping experiment. To extract useful knowledge from these images, effective systems are needed for researchers to evaluate the images, to characterize DNA molecules of interest, to assemble, where appropriate, the selected fragments thereby generating longer fragments or intact DNA molecules, and to validate the assemblies against established data for the molecule of interest. This is particularly relevant in the context of building genome-wide maps by optical mapping, as demonstrated with the ˜25 Mb P. falciparum genome (Lai et al, Nature Genetics 23:309-313, 1999.
In the Lai et al. publication, the P. falciparum DNA, consisting of 14 chromosomes ranging in size from 0.6-3.5 Mb, was treated with either NheI or BamHI and mounted on optical mapping surfaces. Lambda bacteriophage DNA was co-mounted and digested in parallel to serve as a sizing standard and to estimate enzyme cutting efficiencies. Images of molecules were collected and restriction fragments marked, and maps of fragments were assembled or “contiged” into a map of the entire genome. Using NheI, 944 molecules were mapped with the average molecule length of 588 Mb, corresponding to 23-fold coverage; 1116 molecules were mapped using BamHI with the average molecule length of 666 Mb, corresponding to 31-fold coverage (Id at FIG. 3). Thus, each single-enzyme optical map was derived from many overlapping fragments from single molecules. Data were assembled into 14 contigs, each one corresponding to a chromosome; the chromosomes were tentatively numbered 1, the smallest, through 14, the largest.
Various strategies were applied to determine the chromosome identity of each contig. Restriction maps of chromosomes 2 and 3 were generated in silico and compared to the optical map; the remaining chromosomes lacked significant sequence information. Chromosomes 1, 4 and 14 were identified based on size. Pulsed field gel-purified chromosomes were used as a substrate for optical mapping, and their maps aligned with a specific contig in the consensus map. Finally, for chromosomes 3, 10 and 13, chromosome-specific YAC clones were used. The resulting maps were aligned with specific contigs in the consensus map (Id at FIG. 4). Thus, in this experiment multi-enzyme maps were generated by first constructing single enzyme maps which were then oriented and linked with one another. For a number of chromosomes that are similar in size, such as chromosomes 5-9, there are many possible orientations of the maps. Such maps may be linked together by a series of double digestions, by the use of available sequence information, by mapping of YACs which are located at one end of the chromosome, or by Southern blotting.
In short, optical mapping is powerful tool used to construct genome-wide maps. The data generated as such by optical mapping may be used subsequently in other analyses related to the molecules of interest, for example, the construction of restriction maps and the validation of DNA sequence data. There is accordingly a need for systems for visualizing, annotating, aligning and assembling single molecule fragments. Such systems should enable a user to effectively process single molecule images thereby generating useful single molecule data; such systems should also enable the user to validate the resulting data in light of the established knowledge related to the molecules of interest. Robustness in handling large image datasets is desired, as is rapid user response.
A prior system relating to the present disclosure contained scale and angle values that were stored within the system. The correlation of images to determine precise alignment was accomplished by comparing “bright spots” in the images—a very slow process that entailed identification of the bright regions in each successive overlapping region, all in “image space.”
Although the use of a Laplacian filter algorithms have been used previously in automatic focusing applications (E. Krotkov. Focusing. International. Journal of Computer Vision. 1 (3):223-237, 1997; N. Ng Kuang Chern, et al. Practical issues in pixel-based autofocusing for machine vision. Proceedings of the 2001 IEEE International Conference on Robotics and Automation. Seoul, Korea, May 21-26, 2001; J. Krautsky, et al. A new wavelet-based measure of image focus. Pattern Recognition Letters 23:1785-1794, 2002) they were not optimized for the purpose of imaging single molecules in an optical mapping application and were not available in a code library form that could be used in this laboratory. This may be due to the fact that varying types of tissues (cells, DNA, etc.) each present their own set of automatic focusing challenges making a robust general purpose automatic focus algorithm impractical. Moreover, most cameras are sold independent of microscopes and vendors are not aware of the type of translation gear necessary for various applications. Thus, innovative solutions applying the most current technology to the automatic focus concept was necessary; the system according to the present disclosure integrates cameras, translation equipment and software—together which are not available as a package for this particular application. An example of this is the “tiling” step; it is uniquely designed to solve the specific problem of automatically focusing “out of focal plane bright fluorescent objects.” Recently, Zeiss offered an automatic focusing routine that works solely with a Hamamatsu camera; this system remains inadequate for an optical mapping application such as the one described herein, however. Zeiss focusing hardware also appears to relate only to intensity focusing.
In summary, the present disclosure describes a novel, automated solution to a single molecule optical mapping application.