Ligands for macromolecular receptors can be identified by screening diverse collections of peptides produced through either molecular biological or synthetic chemical techniques. Recombinant peptide libraries have been generated by inserting degenerate oligonucleotides into genes encoding capsid proteins of filamentous bacteriophage and the DNA-binding protein Lac I. See Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87: 6378-6382; Scott & Smith, 1990, Science 249: 386-390; Devlin et al., 1990 Science 249: 404-406; Cull et al., 1992, Proc. Natl. Acad. Sci USA 89: 1865-1869; and PCT publication Nos. WO 91/17271, WO 91/19818, WO 93/08278, each of which is incorporated herein by reference. These random libraries may contain more than 10.sup.9 different peptides, each fused to a larger protein sequence that is physically linked to the genetic material encoding it. Such libraries are efficiently screened for interaction with a receptor by several rounds of affinity purification, the selected exposition or display vectors being amplified in E. coli and the DNA of individual clones sequenced to reveal the identity of the peptide responsible for receptor binding. See also PCT publication Nos. WO 91/05058 and WO 92/02536.
Chemical approaches to generating peptide or other molecular libraries are not limited to syntheses using just the 20 genetically coded amino acids. By expanding the building block set to include unnatural amino acids and other molecular building blocks, the accessible sequence and structural diversity is dramatically increased. In several of the strategies described for creating synthetic molecular libraries, the reaction products are spatially segregated and the identity of individual library members is unambiguously defined by the nature of the synthesis See Geysen et al., 1984, Proc. Natl. Acad. Sci. USA 81: 3998-4002; Geysen et al, 1986, in Synthetic Peptides as Antigens; Ciba Foundation Symposium 119, eds. Porter, R. & Wheelan, J. (Wiley, New York) pp. 131-146; Fodor et al., 1991, Science 251: 767-773; U.S. Pat. No. 5,143,854; and PCT patent publication Nos. WO 84/03564; 86/00991; 86/06487; 90/15070; and 92/10092, each of which is incorporated herein by reference.
Libraries of more than 30 million soluble peptides have been prepared by the "tea-bag" method of multiple peptide synthesis. See Houghten, 1985, Proc. Natl. Acad. Sci. USA 82: 5131-5135; and U.S. Pat. No. 4,631,211, each of which is incorporated herein by reference. Each library is synthesized and screened as degenerate peptide mixtures in which individual amino acids within the sequence are explicitly defined. An iterative process of screening (e.g. in a competition binding assay) and resynthesis is used to fractionate these mixtures and define the most active peptides within the library. See Houghten et al., 1991, Nature 354: 84-86; Pinilla et al., 1992, Peptide Research 5: 351-358; Blake, J. & Litzi-Davis, 1992, Bioconjugate Chem. 3: 510-513; and PCT patent publication No. WO 92/09300, each of which is incorporated herein by reference.
Using the split-synthesis protocol of Furka et al., 1988, Abstr. 14th Int. Congr. Biochem., Prague, Czech. 5: 47 (see also Furka et al., 1991, Int I. Peptide Protein Res. 37: 487-493; and Sebestyen et al., 1993, Bioorg. Med. Chem. Lett. 3: 413-418), Lam and coworkers have prepared libraries containing .about.10.sup.6 peptides attached to 100-200 .mu.m diameter resin beads. See Lam et al., 1991, Nature 354: 82-84; Lam et al., 1993, Bioorg. Med. Chem. Lett. 3: 419-424; and PCT patent publication No. WO 92/00091, each of which is incorporated herein by reference. The bead library is screened by incubation with a labelled receptor: beads binding to the receptor are identified by visual inspection and are selected with the aid of a micromanipulator. Each bead contains 50-200 pmol of a single peptide sequence which may be determined directly either by Edman degradation or mass spectrometry analysis. In principle, one could create libraries of greater diversity using this approach by reducing the dimensions of the beads. The sensitivity of peptide sequencing techniques is limited to .about.1 pmole, however, placing a clear limitation on the scope of direct peptide sequencing analysis. Moreover, neither analytical method provides for straightforward and unambiguous sequence analysis when the library building block set is expanded to include D- or other non-natural amino acids or other chemical building blocks.
High throughput screening of collections of chemically synthesized molecules and of natural products (such as microbial fermentation broths) has traditionally played a central role in the search for lead compounds for the development of new pharmacological agents. The remarkable surge of interest in combinatorial chemistry and the associated technologies for generating and evaluating molecular diversity represent significant milestones in the evolution of this paradigm of drug discovery. See Pavia et al., 1993, Bioorg. Med. Chem. Left. 3: 387-396, incorporated herein by reference. To date, peptide chemistry has been the principle vehicle for exploring the utility of combinatorial methods in ligand identification. See Jung & Beck-Sickinger, 1992, Angew. Chem. Int. Ed. Engl. 31: 367-383, incorporated herein by reference. This may be ascribed to the availability of a large and structurally diverse range of amino acid monomers, a relatively generic, high-yielding solid phase coupling chemistry and the synergy with biological approaches for generating recombinant peptide libraries. Moreover, the potent and specific biological activities of many low molecular weight peptides make these molecules attractive starting points for therapeutic drug discovery. See Hirschmann, 1991, Angew. Chem. Int. Ed. Engl. 30: 1278-1301, and Wiley & Rich, 1993, Med. Res. Rev. 13: 327-384, each of which is incorporated herein by reference. Unfavorable pharmacodynamic properties such as poor oral bioavailability and rapid clearance in vivo have limited the more widespread development of peptidic compounds as drugs however. This realization has recently inspired workers to extend the concepts of combinatorial organic synthesis beyond peptide chemistry to create libraries of known pharmacophores like benzodiazepines (see Bunin & Ellman, 1992, I. Amer. Chem. Soc. 114: 10997-10998, incorporated herein by reference) as well as polymeric molecules such as oligomeric N-substituted glycines ("peptoids") and oligocarbamates. See Simon et al., 1992, Proc. Natl. Acad. Sci. USA 89: 9367-9371; Zuckermann et al., 1992, I. Amer. Chem. Soc. 114: 10646-10647; and Cho et al., 1993, Science 261: 1303-1305, each of which is incorporated herein by reference.
Despite the great value that large libraries of molecules can have for identifying useful compounds or improving the properties of a lead compound, the difficulties of screening such libraries, particularly large libraries, has limited the impact access to such libraries should have made in reducing the costs of, e.g., drug discovery and development. Consequently, the development of methods for generating and screening libraries of molecules in which each member of the library is tagged with a unique identifier tag to facilitate identification of compounds (see PCT patent publication No. WO 93/06121, incorporated herein by reference; see also U.S. patent application Ser. Nos. 946,239, filed Sep. 16, 1992, and 762,522, filed Sep. 18, 1991, supra) met with great enthusiasm. In the method, products of a chemical synthesis procedure, typically a combinatorial synthesis on resin beads, are explicitly specified by attachment of an identifier tag to the beads coincident with each coupling or other product generating reaction step in the synthesis. Each tag specifies what happened in a reaction step of interest, e.g. which amino acid monomer was coupled in a particular step of a peptide synthesis procedure. The structure or identity of a compound, e.g. the sequence of a peptide, on any bead can be deduced by reading the set of tags on that bead. Ideally, such tags have a high information content, are amenable to very high sensitivity detection and decoding, and are stable to reagents used in the synthesis. The concept of an oligonucleotide-encoded chemical synthesis was also proposed by Brenner and Lerner, 1992, Proc. Natl. Acad. Sci. USA 89: 5181-5183, incorporated herein by reference.
The encoding method has been employed to show that, starting with an orthogonally differentiated diamine linker, parallel combinatorial synthesis can be used to generate a library of soluble chimeric peptides comprising a "binding" strand and a "coding" strand. See Kerr et al., 1993, I. Amer. Chem. Soc. 115: 2529-2531, incorporated herein by reference. The coupling of either natural or unnatural amino acid monomers to the binding strand was recorded by building an amino acid code comprised of four L-amino acids on the "coding" strand. Compounds were selected from equimolar peptide mixtures by affinity purification on a receptor and were resolved by HPLC. The sequence of the coding strand of individual purified molecules was then determined by Edman degradation to reveal the structure of the binding strand. An analogous peptidic coding scheme was also recently reported by Nikolaiev et al., 1993, Peptide Research 6: 161-170.
Constraints on the sensitivity and throughput of the Edman procedure will ultimately restrict the scope of this aspect of the encoding method to analyzing libraries of limited diversity. The use of oligonucleotide tags offers greater promise, but improved methods for synthesizing oligonucleotide-tagged molecular libraries are needed. Moreover, there remains a need for alternate methodology for synthesizing and screening very large tagged molecular libraries. The present invention meets these and other needs.