The burgeoning cost of drug discovery has led to the ongoing search for new methods of screening greater chemical space as inexpensively as possible to find molecules with greater potency and little to no toxicity. Combinatorial chemistry approaches in the 1980s were originally heralded as being methods to transcend the drug discovery paradigm, but largely failed due to insufficient library sizes and inadequate methods of deconvolution. Recently, the use of DNA-displayed combinatorial libraries of small molecules has created a new paradigm shift for the screening of therapeutic lead compounds.
Morgan et al. (U.S. Patent Application Publication No. 2007/0224607, hereby incorporated by reference) identifies the major challenges in the use of DNA-displayed combinatorial approaches in drug discovery: (1) the synthesis of libraries of sufficient complexity and (2) the identification of molecules that are active in the screens used. In addition, Morgan et al. states that the greater the degree of complexity of a library, i.e., the number of distinct structures present in the library, the greater the probability that the library contains molecules with the activity of interest. Thus, the chemistry employed in library synthesis must be capable of producing vast numbers of compounds within a reasonable time frame. This approach has been generally successful at identifying molecules with diverse chemotypes and high affinity. However, a number of issues have surfaced with respect to generating libraries of enormous complexity and evaluating the sequencing output on the scale that has been described. For example, purification of a library following multiple chemical transformations (e.g., usually 3 or 4 steps) and biological transformations (e.g., enzymatic ligation of DNA tags) is cumbersome and results in a significant amount of “noise” in the library due either to incomplete synthesis of molecules or to mis-tagging during the ligation step. Furthermore, the amount of sequencing that is required to interrogate selected populations is striking, usually requiring “nextgeneration” sequencing methods. The latter is due to the fact that sophisticated genetic tagging schemes embedded in the DNA portion of the library, together with bioinformatics algorithms for analyzing the “nextgeneration” sequencing output, are required to sift through the noise and identify hits in the library. As a result, even with these methodologies, the sequencing is still not advanced enough to fully capture the diversity of sequences (representing both real hits and “noise”) from a given screen.
DNA display of combinatorial small molecule libraries relies on multistep, split-and-pool synthesis of the library, coupled to enzymatic addition of DNA tags that encode both the synthetic step and building block used. Several (e.g., 3 or 4) synthetic steps are typically carried out and encoded, and these include diversity positions (described herein as A, B, and C (FIG. 1)), such as those formed by coupling building blocks with, e.g., amine or carboxylate functional groups onto a chemical scaffold that displays the attached building blocks in defined orientations. One example of a scaffold (S) that is often used in combinatorial libraries is a triazine moiety, which can be orthogonally derivatized in three positions about its ring structure.
The process of library formation can be time consuming, products are often inefficiently purified, and the result is that unknown reactions may occur that create unwanted and/or unknown molecules attached to the DNA. Furthermore, incomplete purification of the library can result in tags cross-contaminating during the ligation steps, resulting in mis-tagging. The end result for screening and sequencing hits from the library is that massively parallel sequencing has to be employed due the inherent “noise” of both DNAs that are attached to molecules that are unintended (e.g., unreacted or side products) or that are mis-tagged. Thus, the efficiency of sequencing is lost.
In some instances, an initiator oligonucleotide, from which the small molecule library is built, contains a primer-binding region for polymerase amplification (e.g., PCR) in the form of a covalently-closed, double-stranded oligonucleotide. This construct is very problematic for performing polymerase reactions, owing to the difficulty of melting the duplex and allowing a primer oligonucleotide to bind and initiate polymerization, which results in an inefficient reaction, reducing yield by 10- to 1000-fold or more.
There exists a need for a more step-wise approach to screening and identifying small molecules that have greater potency and little to no toxicity.