The present invention relates to a method for constructing virtual libraries of molecules and screening these libraries for the existence of a predefined structural motif, and in particular identifying molecules which meet the constraints imposed by a pharmacophore.
A classical approach to the problem of drug-lead optimizationxe2x80x94the so-called xe2x80x9clead explosionxe2x80x9d methodxe2x80x94involves making large numbers of slightly modified analogs of the drug lead compound. Yet while in some cases this can result in analogs with largely increased binding affinities to the desired target, the major drawback of this method is that it results in a set of highly similar molecules. Thus, if the original lead compound fails at a later stage of the drug development process, for reasons which are not directly related to its target binding capabilities, such as problems of solubility, toxicity, or bioavailability for example, there is a good chance that the majority of the second generation analogs will fail likewise.
A more recent approach that overcomes this drawback involves identifying from a set of lead compounds an xe2x80x9cactive structural motifxe2x80x9dxe2x80x94in particular but not limited to a pharmacophorexe2x80x94and searching, by computer algorithms, a database of compounds for its existence. The result of this search may be a set of diverse molecules that display the predefined structural motif [Y. C. Martin, 3D Database Searching in Drug Design, J. Med. Chem. 35, 2145(1992); A. C. Good and J. S. Mason, Three Dimensional Structure Database Searches, Reviews in Comp. Chem. 7, 67(1996)].
The pharmacophore has proven to be a highly valuable and useful concept in drug discovery and drug-lead optimization. A pharmacophore is defined as a distinct three dimensional (3D) arrangement of chemical groups essential for biological activity. Since a pharmaceutically active molecule must interact with one or more molecular structures within the body of the subject in order to be effective, and the desired functional properties of the molecule are derived from these interactions, each active compound must contain a distinct arrangement of chemical groups which enable this interaction to occur. The chemical groups, commonly termed descriptor centers, can be represented by (a) an atom or group of atoms; (b) pseudo-atoms, for example a center of a ring, or the center of mass of a molecule; (c) vectors, for example atomic pairs, electron lone pair directions, or the normal to a plane. Clearly, the ability to design, or identify from large databases, pharmaceutically useful molecules according to the pharmacophore would be highly effective both in the process of drug discovery and in the process of drug lead optimization.
The pharmacophore can be constructed either directly or indirectly. In the direct method and pharmacophore descriptor centers are inferred from studying the X-ray or NMR structure of a receptor-ligand complex, or by a shape-complementarity function analysis of the receptor binding site. In the indirect method the structure of the receptor is unknown and therefore the pharmacophore descriptor centers are inferred by overlaying the 3-dimensional conformations of active compounds and finding the common, overlapping functional groups.
The virtually screened databases may be commercially and/or publicly available or corporate databases of existing compounds, or virtual, existing solely on the computer. In both cases the size of the lists is commonly on the order of tens to hundreds of thousands of molecules. This size limitation, in particular for the virtual databases, commonly stems from limitations of disk space needed to store the library and the speed of the algorithms that are available to scan it.
Yet databases in the above size range comprise only a small subset of chemical space. For example, a database of 100 peptidomimetic scaffolds with 6 side-chain attachment points for the 18 (non-glycine or proline) natural amino acid sidechains can potentially combine to give 3xc3x97109 different molecules, well beyond the size that currently can be screened on available computers, in a reasonable amount of time. Furthermore, most pharmaceutically interesting molecules are flexible, adding an additional level of complexity to the problem. There exist methods that attempt to deal with flexible molecules by constrained optimization, yet these are computationally expensivexe2x80x94the optimization is a computationally demanding overhead on the database search itself. For example using the method of xe2x80x9ctemplate-forcingxe2x80x9d, in which an attempt is made to force each analog to fit the desired conformation, databases that can be virtually screened within a reasonable amount of time are on the orders of magnitude of 105 different compounds. The optional approach is to represent each flexible molecule as a set of discrete conformations. Thus in the above example if the 18 sidechains are represented by a rotamer library of 10 conformations for each, the result will be a database of 3xc3x971015 entities, representing all possible discrete conformations of the 3xc3x97109 different molecules. Since with currently available tools it is not feasible to virtually scan even the smaller library of 3xc3x97109 molecules, a tool that enables the construction and screening of libraries of this size range within reasonable time is of high practical value.
The necessity to scan extraordinarily large number of entities in a search space also arises in the field of protein sequence design. In protein sequence design a large number of sequence combinations needs to be evaluated in searching for the one that optimally lends itself to a particular structure. One method that has been applied to this problem, the Dead-End Elimination algorithm, is related to the art of the present invention in that it utilizes a library of discrete conformations for each of the amino-acid side chains, and defines mathematical criteria for eliminating the vast majority of combinatorial possibilities without actually considering them formally. This algorithm has been successfully applied to the problem of protein design [B. I. Dahiyat and S. L. Mayo, De Novo Protein Design: Fully Automated Sequence Selection. Science 278,82(1997); PCT application No. WO 98/47089].
Thus, there is a widely recognized need for, and it would be highly advantageous to have, a method for constructing very large virtual databases of molecules which are potentially pharmaceutically useful, and for screening these molecules for the existence of a pharmacophore, representing the desired interactions of the useful molecule with one or more structures in the body of the subject.
The present invention features a method for constructing a potentially pharmaceutically useful molecule that contains a desired pharmacophore, associated with a specific biological activity.
One aspect of the present invention is a method for constructing a virtual combinatorial library (VCL), which is a set of abstract super-structures, none of which is a physically realizable entity. Each super-structure features a single chemical scaffold holding all possible substituents at all possible substituent attachment points, concurrently. This VCL represents a set of physically realizable discrete conformations of a defined set of molecular entities. This set may be very large, representing more than 1020 different 3-dimensional structures by a small number of such super-structures.
Another aspect of the present invention is a method for virtually screening this library for the existence of molecules that display a desired, predefined molecular structure, and in particular a pharmacophore.
The VCL is constructed from a virtual library of scaffolds, for example constrained peptidomimetic backbones, and a virtual library of substituents that can be placed at each of a set of predefined attachment positions on each scaffold. The VCL is constructed by placing all rotamers from the virtual library of substituents onto each of the attachment points on each scaffold, concurrently.
The scaffold library is a set of molecules described by three-dimensional coordinates, with a predefined set of attachment points onto which substituents may be chemically attached. The library of substituents is described by all physically realizable conformations of each of the chemical entities that can be chemically connected to the scaffold at said attachment points.
The active structural motif designated herein as the pharmacophore is used to screen the VCL of super-structures in order to identify the combinations of scaffolds and substituents which meet the constraints imposed by the pharmacophore.
The screening process involves the application of a series of filters of increasing complexity, in order to eliminate the substituents that are incompatible with the pharmacophore. Following the filtration process the method produces, from all combinations of scaffolds and substituents that remain, a molecule or molecules which display the desired pharmacophore.
The scaffolds can be any molecule containing at least one attachment point, and that can be represented by a set of discrete conformations. As a non-limiting example, constrained peptide backbones can be used as scaffolds, and amino acid side chains can be represented by rotamer libraries, which include all energetically favorable conformations of each amino acid.
The screening of the VCL is performed by iterative applications of a series of filters of increasing complexity, which efficiently identify and eliminate scaffolds and substituent rotamers that are incompatible with the desired pharmacophore, thereby eventually identifying those scaffolds and substituent rotamers which display the chemical and geometric requirements of the desired pharmacophore. These requirements are defined by a series of pharmacophore parameters. An illustrative example of such parameters may be, but is not limited to a matrix of all pair-wise distances between all pharmacophore descriptors. Thus, by the discretization of the conformational space of both scaffolds and substituents, and by the application of the sophisticated series of filters, the method of the present invention is able to scan very large virtual libraries, representing more than 1020 3-dimensional structures, within reasonable computer time.
According to the present invention there is provided a method for identifying at least one molecule having the constraints imposed by a pharmacophore, each of these constraints being defined by at least one pharmacophore parameter, the steps of the method being performed by a data processor, the method comprising the steps of:
(a) providing a first virtual library of at least one scaffold, each scaffold containing at leas one attachment point for a substituent, and a set of three-dimensional coordinates for each atom of said scaffold;
(b) providing a second virtual library of a plurality of substituents, each of said plurality of substituents being described by a set of physically realizable discrete conformations, wherein each conformation of each of said plurality of substituents is a rotamer;
(c) concurrently adding all rotamers from said second virtual library to each attachment point of each scaffold to form the super-structures;
(d) assigning pharmacophore descriptors to all possible atoms or groups of atoms on the rotamers and on the scaffolds;
(e) applying a series of filters to each super-structure to test the compatibility of each rotamer in the super-structure to a pharmacophore parameter;
(f) eliminating any rotamer if said specific rotamer cannot exist in at least one combination that is compatible with all pharmacophore parameters;
(g) constructing molecules from combinations of remaining rotamers and scaffolds, and selecting those combinations that display the pharmacophore.
The method of the present invention can be described as a plurality of instructions being performed by a data processor, such that the method of the present invention can be implemented in hardware, software, firmware or a combination thereof. As software, the present invention can be implemented in any suitable programming language that is compatible with the computer hardware and operating system which is performing the instructions, and could easily be selected by one of ordinary skill in the art. Examples of such suitable programming languages include, but are not limited to, Fortran, C and C++.