This invention relates to determining whether or not there is an active compound in a compound library. This invention further relates to determining the identity of an active compound present in a library of compounds.
Various procedures have been proposed for identifying compounds present in a library of compounds that are active with respect to a specified target. Such activities have generally been directed to identifying nucleic acids or peptides which bind to a protein or other target molecule.
For example, one such procedure for identifying a nucleic acid present in a nucleic acid library that binds to a target molecule involves contacting a library with a target molecule, by a reiterative procedure, which includes amplification. Thus, for example, Kinzler & Vogelstein (Nucleic Acids Res. 17, 3645, 1989) and Tuerk and Gold (Science 249, 505, 1990) have disclosed an in vitro procedure in which the nucleic acids in the library are exposed to target molecules under competitive binding conditions. Those nucleic acids capable of binding to the target molecule are recovered in preference to those that are not bound, and the recovered, active, nucleic acids are subjected to an amplification procedure such as the polymerase chain reaction (PCR). The amplified nucleic acids are re-exposed to target, isolated and then amplified once again. The value of the amplification step is that with each iteration of the procedure the target is exposed to a mixture of nucleic acids that is progressively enriched for molecules that bind with the highest affinity to the target. This is considered to be a combinatorial approach because it is the property of the entire active nucleic acid molecule that contributes to its ability to bind to the target and leads to its selection to the exclusion of other molecules in the library that bind to the target with lower affinity. The procedure avoids a signal (active nucleic acids) -to-noise (non-active nucleic acids) problem by its process of reiteration. Each time the library is exposed to the target in early iterations, the amount of bound highly active nucleic acids may be very small compared to the background of those bound or recovered that are not active, but eventually the active nucleic acids are abundant enough because of the reiteration and amplification that they can be easily measured and identified. For these reasons, very large libraries can be used and even single molecules of highly active members can be identified. The reiterative procedure, when taken as a whole and coupled with amplification, has a low enough background and eventually a high enough signal from active nucleic acids that it overcomes the usual limitations from signal-to-noise ratio that restrict the useful complexity of most screening procedures. While this is impressive, this class of procedures effectively and dramatically limits the chemical diversity allowed in the libraries. This essentially means that only nucleic acids or slightly modified nucleic acids can be used because these methods rely upon amplification and discrimination by biological methods in order to allow both completion of the reiteration as well as direct and accurate identification of the active members of the library at the end of the procedure.
Screening procedures for peptide libraries are not reiterative. In vitro techniques such as PCR that will amplify peptides are not currently available. Accordingly, different strategies have been implemented. In one such approach, peptides of random sequence are displayed on bacteriophage, the phage are contacted with a target and those phage that interact with the target are isolated, recloned and the sequences encoding the active peptides are determined (Devlin et al., Science 249, 404, 1990; Scott and Smith, Science 249, 386, 1990). In another approach, referred to by some as "Encoded Synthetic Libraries" (ESL), (Dower et al., Patent WO 93/06121; Brenner & Lerner, Proc. Natl. Acad. Sci., U.S. 89, 5381, 1992; Needels et al., Proc. Natl. Acad Sci. U.S. 90, 10700, 1993) peptides are bound either directly or via a bead to nucleic acids in such a way that as an amino acid is added to a growing peptide, one or more nucleotides encoding the added amino acid is added orthogonally to the bead or to the peptide. The advantage of the ESL approach is that high complexity synthetic peptide libraries can be assayed in a combinatorial fashion. All screening strategies that use codes to indirectly identify the active members of a library (such screens have been described for other kinds of synthetic libraries in addition to peptides) are designed to allow combinatorial selection of libraries to identify rare, highly active library members. The signal-to-noise problem is overcome by brute-force approaches such as using cell sorting technology to examine individual beads each containing multiple copies of a single library member and its corresponding code. In this sense, such procedures are, in effect, methods for quickly screening large numbers of compounds, one by one, with advanced technology, as opposed to screening pools or mixtures simultaneously. The coding technology facilitates the identification of the active compounds because in such methods the amount of material isolated is usually very small (i.e., one "bead's worth" of peptide). By using an amplifiable nucleic acid code, the direct identification of the code (and, therefore, the indirect identification of the active compound) is possible. There are, however, limitations to these kinds of technologies, including (1) the need for complex chemistry to couple the library members to their respective codes, (2) the need in some embodiments to have the peptides attached to a comparatively large bead which, even though it might be relatively inert, might nevertheless impede interaction between peptide and target, (3) possible interaction between the target and the nucleic acid code or between peptides and codes and (4) in some of these procedures the compounds of the library are not in solution so that the selection conditions are dissimilar from those in which an active compound would normally be expected to function, namely binding in solution to the target.
There are serial approaches for screening peptide libraries (Houghten et al., Nature 354, 84, 1991) that do not have the complications of encoded library methodologies. Typically, a library consists of pools of peptides each containing the same number of amino acids; for example, the library might contain 400 pools of hexameric peptides such that each pool has one of the 20 amino acids in the first and second position of the peptides (thus 20.times.20=400 pools) and the remaining 4 positions of the peptides are random in every pool. The target is contacted with each of the 400 pools and each pool is measured for activity; for every active pool (for example, ala.his.x.x.x.x), 20 new sub-pools are synthesized wherein the first two amino acids of the active pool are conserved (i.e., ala.his), the third amino acid is fixed (i.e., ala.his.gly.x.x.x in the first pool, ala.his.ala.x.x.x in the second pool and so on) and the fourth, fifth and sixth positions are randomized so that the 20 pools are distinguished by the identity of the amino acid in the third position of the peptide. This serial "unrandomization" procedure is continued until active peptides have been selected in which all six positions are identified. Such a procedure has also been described for oligonucleotides (Ecker et al., patent WO93/04204). This kind of procedure avoids the chemical limitations of the oligonucleotide procedures that use amplification (i.e., the library members do not need to be amplified), as well as the chemical complications of the encoded-library procedures, but these limitations are replaced by the signal-to-noise problem because these procedures are essentially no more than serial pooling strategies that do not effectively address the signal-to-noise limitation. Therefore, these approaches are additive rather than combinatorial, whereby pools containing less active but more abundant subclasses of compounds will be chosen whereas pools containing less abundant, more active compounds are likely to be ignored. Thus, in the example given above, the pool with the best average activity might have been ala.his.x.x.x.x but there might have been individual members in other pools (e.g., val.leu.x.x.x.x) that had higher activity than any individual members in the selected pool but these would not be identified.
In summary, there are three common kinds of procedures for screening synthetic libraries: (1) For oligonucleotides, there are reiterative procedures that include amplification of the nucleic acids. These procedures are combinatorial and minimize the signal-to-noise problem. Without amplification, the direct identification of the active compounds present in large libraries by these procedures would not be possible because the amount of material recovered from the reiterative procedure would be too low to use in known procedures for identifying nucleic acid sequences. Because of the requirement for amplification, however, these procedures are restricted to a very limited set of compounds, namely nucleic acids that can be amplified. (2) Coding procedures are used to facilitate identification of non-amplifiable compounds via amplification of nucleic acid codes that are associated with, or bound to, each compound of a library. These procedures are combinatorial to the extent that the signal-to-noise problem is reduced by the use of screening procedures that minimize the noise and maximize the signal generated by even a single active library member. Such procedures, which usually involve physical separation of a solid support to recover copies of those individual compounds that are active, result in low yields of material, but the amplification of the code allows direct identification by known methods of nucleic acid sequencing. These procedures are generally more cumbersome, do not allow sampling in solution phase, and present challenges in chemistry that stem from the need to have a unique nucleic acid code attached to each compound in the library. (3) Serial "unrandomization" procedures are used to screen synthetic libraries for active compounds. These procedures are very flexible with regard to the chemistry that can be used and avoid some of the complications of the other procedures, but they are not combinatorial because they do not address the signal-to-noise problem. Therefore, procedures (1) and (2) limit the kinds of libraries that can be screened for technical (chemistry) reasons, whereas procedure (3) is likely to yield relatively abundant, less active compounds in preference to the rarer, most active compounds. A procedure that would enable the use of a wide variety of chemistries and synthetic libraries in solution, as in procedure (3), but that was also truly combinatorial, effectively minimizing or eliminating the signal-to-noise problem, as in procedures (1) and (2), would be a very powerful screening approach. The invention described below is a description of one such procedure.