This invention relates to the sequencing of DNA. More specifically, this invention is a method of mapping the relative positions of specific of nucleic acid using scanning probe microscopy.
The human genome project is arguably the largest and most important scientific collaboration in history. Of more importance is the fact that the human genomic project is just the beginning of the genome revolution. It is generally accepted that once the sequence of a genomic is known, it can be xe2x80x9cminedxe2x80x9d for information that will be invaluable in deriving useful products such as new drugs, genetic medicines, improved animal and plant produce, and a host of others. While current methods are adequate for the human genome project to reach is projected completion date early in this millennium, there is ample room for improvements in technologies that would facilitate genome mapping efforts.
While a small number of important genomes are under analysis or have been fully sequenced, current methods are costly and limited in their speed. The genomes of a wide variety of health related and agriculturally relevant organisms remain to be explored. Using current methodology to repeat the effort spent on the human genome for every animal and plant that remains to be studied would be laborious and extremely time consuming. It is therefore essential that technological improvements in current genome analysis methods be invented and implemented to aid in this undertaking.
Current Technology
The initial goal of all genomic projects is to acquire the highest quality sequence data for the genome being studied. This is accomplished by determining the nucleotide sequence of fragments of the genome, and then assembling these sequence fragments into the complete genome sequence. There are no methods in existence for direct sequencing of an entire genome greater than a few thousand base pairs in a single experiment.
The current method for sequencing genome involves first digesting the gemones with a restriction endonuclease. The genome is then subcloned into a variety of vectors including, but not limited to, plasmids, phage vectors, bacterial artificial chromosomes (BACs), and yeast artificial chromosomes (YACs). These fragments are still too large for direct sequencing, and must be further fragmented. The process of re-assembly of all the sequence information represented in these fragments is a formidable task. Current methods of genome analysis split the DNA (deoxyribonucleic acid) into many sub-genomic DNA fragments. These fragments are assembled into contiguous arrays known as xe2x80x9ccontigs.xe2x80x9d There are two general prior art approaches to forming these contigs.
One prior art method used to form contigs is to identify nucleotide sequences by creating xe2x80x9crestriction mapsxe2x80x9d of DNA fragments. These DNA fragments can server to identify genomic fragments and also to identify the overlaps between fragments. A restriction map is a DNA profile that demarcates the positions of target sites for sequence specific restriction endonucleases along the length of the DNA. These maps are generated by digestion of the DNA with a restriction endonuclease and display of the digestion products by electrophoretic separation on a gel matrix, usually agarose or polyacrylamide. One advantage to this process is that it clearly defines which members of a large population or xe2x80x9clibraryxe2x80x9d of gene fragments still need to be sequenced, thereby eliminating undesirable redundancy of effort. Furthermore, once each fragment has been mapped, the maps themselves can be used to determine the order of the fragments in the original sample. This process facilitates their sequential assembly into contigs. This process provides fragment size information, but must be repeated several times with a number of variations to allow deduction of the restriction fragment order in large DNA sample. A need exists for a method that will reduce the effort, time and expense of the above method of nucleotide sequence mapping.
Other methods for characterizing genomic fragments also exist. For example, one common method known in the art as PCR footprinting uses defined sets of short oligonucleotide primers and generates a diagnostic set of PCR fragments from each genomic piece.
The second general prior art approach to genome analysis is to xe2x80x9cshotgunxe2x80x9d sequence randomly selected fragments and attempt to assemble them into the continuous genome sequence by locating sequence overlaps. This requires a large degree of redundancy in the sequencing effort. It is necessary to sequence many-fold more DNA than is contained in a single genome to insure that as many of the genes as possible have been included in the effort. While this approach works for small genomes, the requirement for redundancy of effort, coupled with the extremely low probability of obtaining sequence information for every gene in a genomic library, limit its utility. A need exists for a method that reduces the effort necessary to create these genomic libraries.
Both of these methods are facilitated by the use of physical markers to help identify the specific nucleotide sequence and produce a genomic map. The physical markers used can be produced in a variety of ways and with a wide range in precision. The markers can be genetic loci deduced from classical genetic approaches (e.g., genetic crosses and relative proximity analysis) or more direct methods such as fluorescence in situ hybridization analysis (FISH). The former process is laborious and can be time consuming, especially in the case of slow growing organisms or organisms for which the genetic manipulation tools are rudimentary at best. The latter process requires that prior knowledge about the sequence of the genes under scrutiny be available.
It must be noted that for mapping a genome, it is necessary to have two libraries, each constructed using a different restriction endonuclease. This way, the fragments in the two libraries will overlap (since the two different restriction endonucleases cut the genome at different locations). Thus, by mapping the two libraries, and comparing the results, regions of overlap are discovered and this determines the physical order of the fragments in the genome. These fragments can then be sequenced and the entire genomic sequence determined.
Gene Fragment Polymorphisms (GFPs)
In many cases it is of interest to compare DNA sequences from two sources. For example, in DNA xe2x80x9cfingerprintingxe2x80x9d applications one can use small variations in the sequence of DNA to determine the probability that a particular piece of DNA is derived from a given source. One method to do this is to compare the positions of target sites for endonucleases that cut DNA in a site specific fashion using a restriction endonuclease. If small changes have occurred in the defined DNA sequence from two sources, it is likely that the restriction endonuclease site map will reflect this, either by the gain or the loss of one or more sites. These changes are referred to as restriction fragment length polymorphisms, or RFLPs. RFLPs are a subset of all types of gene fragment polymorphisms, or GFPs, RFLP analysis is usually carried out by the conventional method described above, a restriction endonuclease digestion, followed by gel electrophoresis and Southern blotting. A need exists for a method of analyzing these GFPs that would reduce the time and labor involved, as well as the expenditure on reagents required by these steps.
Functional Sequence Mapping
A large portion of genomic DNA does not encode active genes. In addition, a significant portion of the functional component of a gene is never transcribed into RNA or used to construct a protein. However, these regulatory regions of genes are critical for expression of the gene product and play key roles as, for example, targets for new drugs that regulate levels of gene expression. To discover which regions are functional and which are not, with regard to gene activity, it is often necessary to do a large number of studies with large populations of sub fragments of the genome. This practice can take years of redundant, laborious, and expensive work.
Scanning Probe Microscopy and Atomic Force Microscopy
A scanning probe microscope (SPM) utilizes a probe which is scanned over a surface. The interaction between the probe and surface is detected, recorded, and displayed. If the probe is small and kept very close to the surface, the resolution of the SPM can be very high, even on the atomic scale in some cases. There is a wide variety of SPM instruments capable of detecting optical, electronic, conductive, and other properties. One form of SPM, the atomic force microscope (AFM) is an ultra-sensitive force transduction system. In the AFM, a sharp tip is situated at the end of a flexible cantilever and scanned over a sample surface. While scanning, the cantilever is deflected by the net sum of the attractive and repulsive forces between the tip and sample. If the spring constant of the cantilever is known, the net interaction force can be accurately determined from the deflection of the cantilever. The deflection of the cantilever is usually measured by the reflection of a focused laser beam from the back of the cantilever onto a split photodiode, constituting an xe2x80x9coptical leverxe2x80x9d or xe2x80x9cbeam deflectionxe2x80x9d mechanism. Other methods for the detection of cantilever deflection include interferometry and piezoelectric strain gauges. The first AFMs recorded only the vertical displacements of the cantilever. More recent methods involve resonating the tip and allowing only transient contact, or in some cases no contact at all, between it and the sample. Plots of tip displacement or resonance changes as it traverses a sample surface are used to generate topographic images. Such images have revealed the 3D structure of a wide variety of sample types including material, chemical and biological specimens. Some examples of the latter include DNA, proteins, chromatin, chromosomes, ion channels, and even living cells.
In addition to its imaging capabilities, the AFM can directly sense and measure forces in the microNewton (10xe2x80x2) to picoNewton (10xe2x80x2) range. Thus, the AFM can measure forces between molecular pairs, and even within single molecules. Moreover, the AFM can measure a wide variety of other forces and phenomena, such as magnetic fields, thermal gradients and viscoelasticity. This ability can be exploited to map force fields on a sample surface, and reveal with light resolution the location and magnitude of the these fields, as in, for example, localizing magnetic microparticles tethered to biomolecular complexes of interest.