There have been two different approaches to the construction of random peptide libraries. According to one approach, peptides have been chemically synthesized in vitro in several formats. For example, Fodor, S., et al., 1991, Science 251: 767-773, describes use of complex instrumentation, photochemistry and computerized inventory control to synthesize a known array of short peptides on an individual microscopic slide. Houghten, R., et al., 1991, Nature 354: 84-86, describes mixtures of free hexapeptides in which the first and second residues in each peptide were individually and specifically defined. Lain, K., et al., 1991, Nature 354: 82-84, describes a "one bead, one peptide" approach in which a solid phase split synthesis scheme produced a library of peptides in which each bead in the collection had immobilized thereon a single, random sequence of amino acid residues. For the most part, the chemical synthetic systems have been directed to generation of arrays of short length peptides, generally fewer than about 10 amino acids or so, more particularly about 6-8 amino acids. Direct amino acid sequencing alone or in combination with complex record keeping of the peptide synthesis schemes is required. According to a second approach using recombinant DNA techniques, peptides have been expressed in vivo as either soluble fusion proteins or viral capsid fusion proteins. The second approach is discussed briefly below.
A number of peptide libraries according to the second approach have used the M13 phage. M13 is a filamentous bacteriophage that has been a workhorse in molecular biology laboratories for the past 20 years. The viral particles consist of six different capsid proteins and one copy of the viral genome, as a single-stranded circular DNA molecule. Once the M13 DNA has been introduced into a host cell such as E. coli, it is converted into double-stranded, circular DNA. The viral DNA carries a second origin of replication that is used to generate the single-stranded DNA found in the viral particles. During viral morphogenesis, there is an ordered assembly of the single-stranded DNA and the viral proteins, and the viral particles are extruded from cells in a process much like secretion. The M13 virus is neither lysogenic nor lytic like other bacteriophage (e.g., .lambda.); cells, once infected, chronically release virus. This feature leads to high tilers of virus in infected cultures, i.e., 10.sup.12 pfu/ml.
The genome of the M13 phage is .about.8000 nucleotides in length and has been completely sequenced. The viral capsid protein, protein III (pIII) is responsible for infection of bacteria. In E. coli, the pillin protein encoded by the F factor interacts with pIII protein and is responsible for phage uptake. Hence, all E. coli hosts for M13 virus are considered male because they carry the F factor. Several investigators have determined from mutational analysis that the 406 amino acid long pIII capsid protein has two domains. The C-terminus anchors the protein to the viral coat, while portions of the N-terminus of pIII are essential for interaction with the E. coli pillin protein (Crissman, J. W. and Smith, G. P., 1984, Virology 132: 445-455). Although the N-terminus of the pIII protein has shown to be necessary for viral infection, the extreme N-terminus of the mature protein does tolerate alterations. In 1985, George Smith published experiments reporting the use of the pIII protein of bacteriophage M13 as an experimental system for expressing a heterologous protein on the viral coat surface (Smith, G. P., 1985, Science 228: 1315-1317). It was later recognized, independently by two groups, that the M13 phage pIII gene display system could be a useful one for mapping antibody epitopes. De la Cruz, V., et al., (1988, J. Biol. Chem. 263: 4318-4322) cloned and expressed segments of the cDNA encoding the Plasmodium falciparum surface coat protein into the gene III, and recombinant phage were tested for immunoreactivity with a polyclonal antibody. Parmley, S. F. and Smith, G. P., (1988, Gene 73: 305-318) cloned and expressed segments of the E. coli .beta.-galactosidase gene in the gene III and identified recombinants carrying the epitope of an anti-.beta.-galactosidase monoclonal antibody. The latter authors also described a process termed "biopanning", in which mixtures of recombinant phage were incubated with biotinylated monoclonal antibodies, and phage-antibody complexes could be specifically recovered with streptavidin-coated plastic plates.
In 1989, Parmley, S. F. and Smith, G. P., (1989, Adv. Exp. Med. Biol. 251:2 15-218), suggested that short, synthetic DNA segments cloned into the pIII gene might represent a library of epitopes. These authors reasoned that since linear epitopes were often .about.6 amino acids in length, it should be possible to use a random recombinant DNA library to express all possible hexapeptides to isolate epitopes that bind to antibodies.
Scott and Smith (Scott, J. K. and Smith, G. P., 1990, Science 249: 386-390) describe construction and expression of an "epitope library" of hexapeptides on the surface of M13. The library was made by inserting a 33 base pair Bgl I digested oligonucleotide sequence into an Sfi I digested phage fd-tet, i.e., fUSE5 RF. The 33 base pair fragment contain a random or "degenerate" coding sequence (NNK).sub.6 where N represents G, A, T and C and K represents G and T. The authors stated that the library consisted on 2.times.10.sup.8 recombinants expressing 4.times.10.sup.7 different hexapeptides; theoretically, this library expressed 69% of the 6.4.times.10.sup.7 possible peptides (20.sup.6). Cwirla et al. (Cwirla, S. E., et al., 1990, Proc. Natl. Acad. Sci. U.S.A. 87: 6378-6382) also described a somewhat similar library of hexapeptides expressed as gene pIII fusions of M13 fd phage. WO91/19818 published Dec. 26, 1991 by Dower and Cwirla describes a similar library of pentameric to octameric random amino acid sequences.
Devlin et al., 1990, Science, 249:404-406, describes a peptide library of about 15 residues generated using an (NNS) coding scheme for oligonucleotide synthesis in which S is G or C.
Christian and colleagues have described a phage display library, expressing decapeptides (Christian, R. B., et al., 1992, J. Mol. Biol. 227: 711-718). The starting DNA was generated by means of an oligonucleotide comprising the degenerate codons [NN(G/T)].sub.10 with a self-complementary 3' terminus. This sequence, in forming a hairpin, creates a self-priming replication site which could be used by T4 DNA polymerase to generate the complementary strand. The double-stranded DNA was cleaved at the SfiI sites at the 5' terminus and hairpin for cloning into the fUSE5 vector described by Scott and Smith, Supra.
Other investigators have used other viral capsid proteins for expression of non-viral DNA on surface of phage particles. The protein pVIII is a major viral capsid protein and interacts with the single stranded DNA of M13 viral particles at its C-terminus. It is 50 amino acids long and exists in approximately 2,700 copies per particle. The N-terminus of the protein is exposed and will tolerate insertions, although large inserts have been reported to disrupt the assembly of fusion pVIII proteins into viral particles (Cesareni, G., 1992, FEBS Lett. 307: 66-70). To minimize the negative effect of pVIII-fusion proteins, a phagemid system has been utilized. Bacterial cells carrying the phagemid are infected with helper phage and secrete viral particles that have a mixture of both wild-type and fusion pVIII capsid molecules. Gene VIII has also served as a site for expressing peptides on the surface of M13 viral particles. Four and six amino acid sequences corresponding to different segments of the Plasmodium falciparum major surface antigen have been cloned and expressed in the comparable gene of the filamentous bacteriophage fd (Greenwood, J., et al., 1991, J. Mol. Biol. 220: 821-827).
Lenstra, (1992, J. Immunol. Meth. 152:149-157) describes construction of a library by a laborious process encompassing annealing oligonucleotides of about 17 or 23 degenerate bases with an 8 nucleotide long palindromic sequence at their 3' ends to express random hexa- or octapeptides as fusion proteins with the .beta.-galactosidase protein in a bacterial expression vector. The DNA was then converted into a double-stranded form with Klenow DNA polymerase, blunt-end ligated into a vector, and then released as HindIII fragments. These fragments were then cloned into an expression vector at the C-terminus of a truncated .beta.-galactosidase to generate 10.sup.7 recombinants. Colonies were then lysed, blotted on nitrocellulose filters (10.sup.4 /filter) and screened for immunoreactivity with several different monoclonal antibodies. A number of clones were isolated by repeated rounds of screening and were sequenced.
Completely unlike the above discussed methods for generating a library of peptides which have been suggested for use to identify peptides having binding affinity for a chosen ligand, the present scheme for synthesis and assembly of the oligonucleotides provides sequences of oligonucleotides encoding unpredicted amino acid sequences which are larger in size, i.e., longer in length than any prior conventional libraries.
Completely contrary to the conventional teaching in the art that the length of inserted oligonucleotides should be kept small encoding preferably less than 15 and most preferably about 6-8 amino acids, the present inventors have found that not only can libraries encoding greater than about 22 amino acids be constructed, but that such libraries can be advantageously screened to identify TSARs or proteins, polypeptides and/or proteins having binding specificity for a variety of ligands.
Additionally, the longer length of the inserted synthesized oligonucleotides of the present libraries may provide the opportunity for the development of secondary and/or tertiary structure in the potential binding proteins/peptides and in sequences flanking the actual binding portion of the binding domain of the peptide. Such complex structural developments are not feasible when only shorter length oligonucleotides are used.
As understood in the art, there is a need to reduce TAG (stop) codon frequency in the oligonucleotides expressed by a peptide library. Those skilled in the art would expect to solve this problem by using hosts carrying suppressor tRNA genes. However, contrary to the conventional teaching, the present inventors have surprisingly discovered that suppression may not be 100% efficient to avoid stop codon exression in an oligonucleotide coding for a random peptide. This problem becomes very serious when expressing oligonucleotides of longer length encoding random peptides. The present invention effectively and efficiently minimizes the negative impact of such problem on the generation of a useful library.
Citation or identification of any reference in Section 2 of this application shall not be construed as an admission that such reference is available as prior art to the present invention.