It is known in the art of molecular biology that a nucleic acid fragment lying between two identified and unique primer sequences can be amplified using the polymerase chain reaction (PCR) or modifications of the PCR. PCR avoids conventional molecular cloning techniques that require the existence in nucleic acid of advantageous restriction endonuclease cleavage sites. One identified shortcoming of PCR is that fragments greater than about 40 kilobase pairs between the PCR primers are only weakly amplified. It has been difficult to obtain meaningful sequence data from large genomic fragments, particularly when such fragments are resistant to traditional cloning methods. Thus, the art is seeking new methods to obtain the nucleic acid sequences of long, uncharacterized regions of genetic material.
Efforts to amplify a specific DNA cleavage fragment from a population of such fragments have included methods that involve cleaving the DNA using Class IIS enzymes or interrupted palindrome enzymes to form fragments having non-specific terminal 5' or 3' overhangs of various lengths (generally 2 to 5 bases). Smith, D. R., PCR Methods and Applications 2:21-27, Cold Spring Harbor Laboratory Press (1992); Unrau, P. and K. Deugau, Gene 145:163-169 (1994); U.S. Pat. No. 5,508,169 (Deugau et al.); Zheleznaya, L. A. et al., Biochemistry (Moscow) 60:1037-1043 (1995). Class IIS enzymes cleave DNA asymmetrically at precise distances from their recognition sequences. Interrupted palindrome ("IP") enzymes cleave symmetrically between a pair of interrupted palindromic binding sites. To amplify the products of such cleavages, nucleic acid indexing linkers, containing protruding single strands complementary to the cohesive ends of Class IIS- or IP cleavage sites (rather than recognition sequences) and PCR primer sites, have been annealed and ligated to fragments generated by Class IIS- or IP cleavage.
The overhangs vary in base composition, and are determined by the locations of the enzymes' cleavage sites in a genome. The base composition and sequence of the overhang created after cleavage with a Class IIS or IP enzyme cannot be predicted because the sites at which those enzymes cleave DNA are determined by spatial relationship to the recognition sequence, but are not sequence-determined. In the methods described by Smith, Unrau, Deugau and Zheleznaya, the unique cleavage sites generated by Class IIS and IP enzymes determined a random sequence by which fragments could be indexed. However, that is not the case with more popular Class II enzymes that cleave within their recognition sites and generate predictable, identical sticky ends on each restriction fragment. Also, Unrau's method employs temperatures that result in a problem of illegitimate base pairing as well as problems with primer dimers, where indexing fragments anneal with one another rather with the target DNA.
What is desired is an indexing system that relies upon fragments not generated by Class IIS or IP enzymes, and which offer improved amplification specificity.