Currently, methods such as virtual screening have been used to identify a set of compounds with potential biological activity to serve as the starting points for drug discovery. Starting with a large set of chemical structures in electronic format, these methods identified a small subset with a higher probability of interaction with the system of interest by applying a series of filters to the initial set, including but not limited to 2-D similarity to a query molecule, 3-D similarity to a query molecule, chemical property limits such as Lipinski's rule of 5, and docking to an experimental or a theoretical bio-molecular target structure. Because it is practically impossible to enumerate and filter all possible real-world structures, there is a continuing need to limit the size of the initial structure set and provide new methods for identifying a manageable subset of the overall chemistry space so that virtual screening and target assay screening can be performed.
We have previously described the concept of the SYNTHEVERSE™ (Bioblocks, Inc. San Diego, Calif.) chemistry space (see, e.g., Virtually screening the Syntheverse: Finding new leads from synthetically feasible libraries; Lemmen, et al., Abstracts of Papers, 241st ACS National Meeting & Exposition, Anaheim, Calif., United States, Mar. 27-31, 2011 (2011), COMP-11), which is the collection of all compounds that can be made by current synthetic methods. Every compound contained in the SYNTHEVERSE™ is the product of at least one synthetic scheme and is connected to at least one set of starting materials that could be used to make it. By nature, this subset is still impractically large and continuously growing as new synthetic methods are discovered.
A given implementation of the SYNTHEVERSE™ can be biased towards the desired endpoint for the structures, for example, an implementation designed to contain compounds with potential biological activity can be designed to avoid functionality known to cause problems in biological systems. Alternatively, implementation of the SYNTHEVERSE™ can be biased toward a particular class or classes of final product structures, for example, compounds that can be constructed from commercially available starting materials. Alternatively, implementation of the SYNTHEVERSE™ can be biased towards a specific use of the product compounds, for example, an implementation designed to contain compounds with potential high temperature superconductivity will contain potential products containing multiple metal ions. However, a biased SYNTHEVERSE™ is still too large to enumerate; even a two-step reaction sequence with limited starting materials can generate millions of potential products, and the number of products grows exponentially with the number of starting materials and reaction steps.
Known methods include screening a SYNTHEVERSE™ chemistry space using FEATURETREES™ (BioSolveIT GmbH, Sankt Augustin, Germany) similarity to identify compounds similar to a query, using a known compound with high activity against at least one biological target, and using an available crystal structure. The structures are encoded as fragments and built stepwise into a set of final structures, choosing those that are similar to the query. A practically-sized set of product molecules, for example 10,000, similar to the original query in size and complexity, are then filtered through a virtual screening protocol to identify compounds with potential to interact with the target of the original query.
Fragment Based Lead Discovery is a set of recent methods that provide alternative starting points (e.g., for lead compound s) for drug discovery. Compared to the products of either virtual or target assay screening, the identified compounds are both less complex and more efficient at binding their target. As such, they serve as higher quality starting points for the identification of drug candidates with the properties required for safe and effective use in humans. Application of cycles of synthetic expansion, modification and biological assay feedback generate compounds with the potency to be used as a lead compound for a traditional drug discovery effort.