1. Field of the Invention
This invention relates to a method for constructing oligonucleotide concatamer libraries by rolling circle replication of a single-stranded oligonucleotide. An oligonucleotide concatamer is defined herein as the structure formed by concatenation of unit-sized oligonucleotide components. Concatenation is the process of linking multiple subunits into a tandem series or chain, as occurs during replication of genomic subunits of phage lambda.
Natural genes and proteins often contain tandemly repeated sequence motifs that dramatically increase physiological specificity and activity. Given the selective values of such repeats, it is likely that several different mechanisms have been responsible for their generation. One mechanism that has been shown to generate relatively long tandem repeats (in the kilobase range) is rolling circle replication. In this patent application, we demonstrate that rolling circle synthesis in a simply enzymatic system can produce tandem repeats of monomers as short as 34 bp. These observations provide a facile means for constructing libraries of repeated motifs for use in "in vitro evolution" experiments designed to select molecules with defined biological or chemical properties.
2. Description of the Related Art
Analysis of naturally occurring macromolecular sequences has revealed repetitive structure at a variety of levels (1, 2). Particularly relevant to gene expression and replication are sets of short sequence motifs that often occur in multiple copies around promoter/enhancer regions and replication origins (3). The repetition of motifs within a control region has been shown in many cases to allow individual trans-acting factors to exert additive and/or cooperative effects; this design can improve the specificity of a control mechanism by increasing the signal of appropriate activity while decreasing the possibility of fortuitous inappropriate activity (4). Requirements for repeated sequence motifs have also been found in characterizing the activities of specific RNA (5-8) and protein (9-11) functions.
In investigating structure-function relationships in vitro and in vivo, several researchers have used strategies that involve the production of a large library of random sequences followed by selection for sequences with a given property (12). These schemes can produce experimentally useful reagents and provide a wealth of information about sequence requirements for the selected activity (e.g., refs. 13-15). Application of such a selection strategy depends on the ability to produce large libraries of random sequences, efficient selection procedures, and appropriate means for recovering and characterizing the selected molecules. Frequently, the techniques for selection or screen of molecules are insufficient to find active sequences. In particular, if several tandem copies of a functional segment are required for activity, then the problem of recovering an active sequence from a random pool becomes increasingly more difficult.
To circumvent the insufficiency of available selection techniques for many interesting biological and biochemical activities, we sought to produce libraries of random repeated sequences: pools of molecules in which each member contains tandem repeats of a different sequence element. The potential usefulness of such concatamer libraries can be illustrated by calculating the probability that a given 8-base pair element will occur independently in three positions in a single random 60-mer sequence (.apprxeq.1 in 250 million). If we replace the random 60-mers with a library of trimerized random 20-mers, then this probability improves by .apprxeq.5 orders of magnitude.
We considered several different methods for constructing concatamer libraries. Chemical synthesis can be used to generate random pools of DNA sequence oligomers (12), but straightforward ligation to concatenate elements from these pools would not produce the desired result, since there is no means to ensure that ligation joins molecules of the same sequence. A straightforward method for generation of a small library would be to separately synthesize and concatenate a number of separate oligonucleotides (16); unfortunately, this would be cumbersome and expensive for large libraries.
As a more general procedure we chose a scheme based on the rolling circle replication used by many plasmids and viruses (17, 18). Rolling circle replication is a mode of replication in which a replication fork proceeds around a circular template for an indefinite number of revolutions. The nucleic acid strand newly synthesized in each revolution displaces the strand synthesized in the previous revolution, giving a tail containing a linear series of sequences complementary to the circular template strand.
Rolling circle replication involves two simultaneous processes: (i) DNA polymerase must synthesize sequences complementary to a circular template. (ii) As this replication proceeds, some mechanism must unwind the parental duplex to allow the polymerase to advance. Models for physiological rolling circle replication generally involve a template that is predominantly double stranded, with a helicase or single-strand DNA binding activity preceding the polymerase to allow replication to continue (17, 18). Characterized rolling circle replication mechanisms have been found to operate on templates on the order of kilobases and larger (18). Rolling circle replication of templates smaller than 100 bp by previously described mechanisms would be considered unlikely, since formulation of very short double-stranded circles would be topologically obstructed (19). Although there was no precedent, we chose to examine the ability of predominantly single-stranded circles to act as templates for rolling circles synthesis.