The desire to understand the genetic basis of disease and a host of other physiological states associated different patterns of gene expression has led to the development of several approaches to large-scale analysis of DNA, Adams et al, Editors, Automated DNA Sequencing and Analysis (Academic Press, New York, 1994). Current techniques for analyzing gene expression patterns include large-scale sequencing, differential display, indexing schemes, subtraction hybridization, hybridization with solid phase arrays of cDNAs or oligonucleotides, and numerous DNA fingerprinting techniques, e.g. Lingo et al, Science, 257: 967-971 (1992); Erlander et al, International patent application PCT/US94/13041; McClelland et al, U.S. Pat. No. 5,437,975; Unrau et al, Gene, 145: 163-169 (1994); Schena et al, Science, 270: 467-469 (1995); Velculescu et al, Science, 270: 484-486 (1995); and the like.
An important subclass of such techniques employs double stranded oligonucleotide adaptors to classify populations of polynucleotides and/or to identify nucleotides at the termini of polynucleotides, e.g. Unrau et al (cited above) and U.S. Pat. No. 5,508,169; Sibson, International applications PCT/GB93/01452 and PCT/GB95/00109; Cantor, U.S. Pat. No. 5,503,980; and Brenner, International application PCT/US95/03678 and U.S. Pat. No. 5,552,278. Such adaptors typically have protruding strands which permit specific hybridization and ligation to polynucleotides having complementary ends. Identification or classification is effected by carrying out such reactions in separate vessels or by providing labels which identify one or more nucleotides in the protruding strand of the ligated adaptor.
In these techniques, special problems arise in dealing with either polynucleotide ends or adaptors that are capable of self-ligation, such as that illustrated in FIG. 1, where the four-nucleotide protruding strands of the anchored polynucleotides are complementary to one another. When self-ligation occurs, the protruding strands of either the adaptors or the target polynucleotides are no longer available for analysis or processing. This, in turn, leads to the loss or disappearance of signals generated in response to correct ligations of adaptors to target polynucleotides. The self-ligation problem is especially acute when identical target polynucleotides are anchored to a solid phase support. In this situation, the local concentration of ends capable of self-ligation is typically very high relative to that of double stranded adaptors, thereby making self-ligation the favored reaction, whenever complementary sequences are present. As illustrated in FIG. 1, complementary sequences form a palindromic duplex upon hybridization. Since the probability of a palindromic 4-mer occurring in a random sequence is the same as the probability of a repeated pair of nucleotides (6.25%), adaptor-based methods for de novo sequencing have a high expectation of failure after a few cycles because of self-ligation. When this occurs, further analysis of the polynucleotide becomes impossible.
In view of the increasing importance of adaptor-based techniques in nucleic acid sequence analysis, the availability of methods and materials for overcoming the self-ligation problem would be highly desirable.