The present invention generally relates to novel methods for isolating, characterizing and mapping genetic markers in polynucleotide sequences. More particularly, the present invention provides methods for mapping genetic material using Type-IIs restriction endonucleases. The methods herein described result in the xe2x80x9ccapturingxe2x80x9d and determination of specific oligonucleotide sequences located adjacent to Type-IIs restriction sites. The resulting sequences are useful as effective markers for use in genetic mapping, screening and manipulation.
The relationship between structure and function of macromolecules is of fundamental importance in the understanding of biological systems. These relationships are important to understanding, for example, the functions of enzymes, structural proteins and signalling proteins, ways in which cells communicate with each other, as well as mechanisms of cellular control and metabolic feedback.
Genetic information is critical in continuation of life processes. Life is substantially informationally based and its genetic content controls the growth and reproduction of the organism and its complements. The amino acid sequences of polypeptides, which are critical features of all living systems, are encoded by the genetic material of the cell. Further, the properties of these polypeptides, e.g., as enzymes, functional proteins, and structural proteins, are determined by the sequence of amino acids which make them up. As structure and function are integrally related, many biological functions may be explained by elucidating the underlying structural features which provide those functions, and these structures are determined by the underlying genetic information in the form of polynucleotide sequences. Further, in addition to encoding polypeptides, polynucleotide sequences also can be involved in control and regulation of gene expression. It therefore follows that the determination of the make-up of this genetic information has achieved significant scientific importance.
Physical maps of genomic DNA assist in establishing the relationship between genetic loci and the DNA fragments which carry these loci in a clone library. Physical maps include xe2x80x9chardxe2x80x9d maps which are overlapping cloned DNA fragments (xe2x80x9ccontigsxe2x80x9d) ordered as they are found in the genome of origin, and xe2x80x9csoftxe2x80x9d maps which consist of long range restriction enzyme and cytogenetic maps (Stefton and Goodfellow, 1992). In the latter case, the combination of rare cutting restriction endonucleases (e.g., NotI) and pulse gel electrophoresis allows for the large scale mapping of genomic DNAs. These methods provide a low resolution or top down approach to genomic mapping.
A bottom up approach is exemplified by construction of contiguous or xe2x80x9ccontigxe2x80x9d maps. Initial attempts to construct contig maps for the human genome have been based upon ordering inserts cloned into cosmids. More recent studies have utilized yeast artificial chromosomes (YACs) which allow for cloning larger inserts. The construction of contig maps require that many clones be examined (4-5 genome equivalents) in order to assure that sufficient overlap between clones is achieved. Currently, four approaches are used to identify overlapping sequences.
The first method is restriction enzyme fingerprinting. This method involves the electrophoretic sizing of restriction enzyme generated DNA fragments for each clone and establishing a criterion for clone overlap based on the similarity of fragment sets produced for each clone. The sensitivity and specificity of this approach has been improved by labelling of fragments using ligation, and end-filling techniques. The detection of repetitive sequence elements (e.g., [GT]n) has also been employed to provide characteristic markers.
The second method generally employed in mapping applications is the binary scoring method. This method involves the immobilization of members of a clone library to filters and hybridization with sets of oligonucleotide probes. Several mathematical models have been developed to avoid the need for large numbers of the probe sets which are designed to detect the overlap regions.
A third method is the Sequence Tagged Site (xe2x80x9cSTSxe2x80x9d) method. This method employs PCR techniques and gel analysis to generate DNA products whose lengths characterize them as being related to common regions of sequence that are present in overlapping clones. The sequence of the primary pairs and the characteristic distance between them provides sufficient information to establish a single copy landmark (SCL) which is analogous to single copy probes that are unique in the entire genome.
A fourth method uses cross-hybridizing libraries. This method involves the immobilization of two or more pools of cosmid libraries followed by cross-hybridization experiments between pairs of the libraries. This cross-hybridization demonstrates shared cloned sequences between the library pairs. See, e.g., Kupfer, et al., (1995) Genomics 27:90-100.
Although each of these methods is capable of generating useful physical maps of genomic DNA, they each involve complex series of reaction steps including multiple independent synthesis, labelling and detection procedures.
Traditional restriction endonuclease mapping techniques, i.e., as described above, typically utilize restriction enzyme recognition/cleavage sites as genetic markers. These methods generally employ Type-II restriction endonucleases, e.g., EcoRI, HindIII and BamRI, which will typically recognize specific palindromic nucleotide sequences, or restriction sites, within the polynucleotide sequence to be mapped, and cleave the sequence at that site. The restriction fragments which result from the cleavage of separate fragments of the polynucleotide (i.e., from a prior digestion) are then separated by size. Overlap is shown where restriction fragments of the same size appear from Type-II endonuclease digestion of separate polynucleotide fragments.
Type-IIs endonucleases, on the other hand, generally recognize non-palindromic sequences. Further, these endonucleases generally cleave outside of their recognition site, thus producing overhangs of ambiguous base pairs. Szybalski, 1985, Gene 40:169-173. Additionally, as a result of their non-palindromic recognition sequences, the use of Type-IIs endonucleases will generate more markers per Kb than a similar Type-II endonuclease, e.g., approximately twice as often.
The use of Type-IIs endonucleases in mapping genomic markers has been described in, e.g., Brenner, et al., P.N.A.S. 86:8902-8906 (1989). The methods described involved cleavage of genomic DNA with a Type-IIs endonuclease, followed by polymerization with a mixture of the four deoxynucleotides as well as one of the four specific fluorescently labelled dideoxynucleotides (ddA, ddT, ddG or ddC). Each successive unpaired nucleotide within the overhang of the Type-IIs cleavage site would be filled by either a normal nucleotide or the labelled dideoxynucleotide. Where the latter occurred, polymerization stopped. Thus, the polymerization reaction yields an array of double stranded fluorescent DNA fragments of slightly different sizes. By reading the size from smallest size to largest, in each of the nucleotide groups, one can determine the specific sequence of the overhang. However, this method can be time consuming and yields only the sequence of the overhang region.
Oligonucleotide probes have long been used to detect complementary nucleic acid sequences in a nucleic acid of interest (the xe2x80x9ctargetxe2x80x9d nucleic acid). In some assay formats, the oligonucleotide probe is tethered, i.e., by covalent attachment, to a solid support, and arrays of oligonucleotide probes immobilized on solid supports have been used to detect specific nucleic acid sequences in a target nucleic acid. See, e.g., U.S. patent application Ser. No. 08/082,937, filed Jun. 25, 1993, which is incorporated herein by reference. Others have proposed the use of large numbers of oligonucleotide probes to provide the complete nucleic acid sequence of a target nucleic acid but failed to provide an enabling method for using arrays of immobilized probes for this purpose. See U.S. Pat. Nos. 5,202,231 and 5,002,867.
The development of VLSIPS(trademark) (Very Large Substrate Immobilized Polymer Synthesis) technology has provided methods for making very large combinations of oligonucleotide probes in very small arrays. See U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, each of which is incorporated herein by reference in its entirety for all purposes. U.S. patent application Ser. No. 08/082,937, incorporated above, also describes methods for making arrays of oligonucleotide probes that can be used to provide the complete sequence of a target nucleic acid and to detect the presence of a nucleic acid containing a specific nucleotide sequence.
The construction of genetic linkage maps and the development of physical maps are essential steps on the pathway to determining the complete nucleotide sequence of the human or other genomes. Present methods used to construct these maps rely upon information obtained from a range of technologies including gel-based electrophoresis, hybridization, polymerase chain reaction (PCR) and chromosome banding. These methods, while providing useful mapping information, are very time consuming when applied to very large genome fragments or other nucleic acids. There is therefore a need to provide improved methods for the identification and correlation of genetic markers on a nucleic acid which can be used to rapidly generate genomic maps. The present invention meets these and other needs.
The present invention provides methods for identifying specific oligonucleotide sequences using Type-IIs endonucleases in sequential order to capture the ambiguous sequences adjacent to the Type-IIs recognition sites. These ambiguous sequences can then be probed sequentially with probes specific for the various combinations of possible ambiguous base pair sequences. By determining which probe hybridizes with an ambiguous sequence, that sequence is thus determined. Further, because that sequence is adjacent to a specific Type-IIs cleavage site that portion of the sequence is also known. This contiguous sequence is useful as a marker sequence in mapping genomic libraries.
In one embodiment, the present invention provides a method of identifying sequences in a polynucleotide sequence. The method comprises cleaving the polynucleotide sequence with a first type-IIs endonuclease. A first adapter sequence, having a recognition site for a second type-IIs endonuclease, is ligated to the polynucleotide sequence cleaved in the first cleaving step. The polynucleotide sequence resulting from the first ligating step, is cleaved with the second type-IIs endonuclease, and a second adapter sequence is ligated to the polynucleotide sequence cleaved in the second cleaving step. The sequence of nucleotides of the polynucleotide sequence between the first and second adapter sequences is then determined.
In another embodiment, the present invention provides a method of generating an ordered map of a library of genomic fragments. The method comprises identifying sequences in each of the genomic fragments in the library, as described above. The identified sequences in each fragment are compared with the sequences identified in each other fragment to obtain a level of correlation between each fragment and each other fragment. The fragments are then ordered according to their level of correlation.
In a further embodiment, the present invention provides a method of identifying polymorphisms in a target polynucleotide sequence. The method comprises identifying sequences in a wild-type polynucleotide sequence, according to the methods described above. The identifying step is repeated on the target polynucleotide sequence. The differences in the sequences identified in each of the identifying steps are determined, the differences being indicative of a polymorphism.
In still another embodiment, the present invention provides a method of identifying a source of a biological sample. The method comprises identifying a plurality of sequences in a polynucleotide sequence derived from the sample, according to the methods described herein. The plurality of sequences identified in the identifying step are compared with a plurality of sequences identically identified from a polynucleotide derived from a known source. The identity of the plurality of sequences identified from the sample with the plurality of sequences identified from the known source is indicative that the sample was derived from the known source.
In an additional embodiment, the present invention provides a method of determining a relative location of a target nucleotide sequence on a polynucleotide. The method comprises generating an ordered map of the polynucleotide according to the methods described herein. The polynucleotide is fragmented. The fragment which includes the target nucleotide sequence is then determined, and a marker on the fragment is correlated with a marker on the ordered map to identify the approximate location of the target nucleotide sequence on the polynucleotide.