1. Field of the Invention
This disclosure relates to the field of computer-aided molecule design, and in particular, to utilizing shape similarity and electrostatic complementarity of molecules, as evidenced using ray-tracing techniques, to quickly compare compounds with each other and ligands with receptors.
2. Description of the Related Art
It is generally presumed in biology that a biologically-active compound having a particular effect on certain biological processes is biologically active because of its ability to interact with a complementary receptor. This principle guides drug design, chemical warfare, nutrition toxicology, and the understanding of biochemical processes of disease and normal body functioning. Biological activity is further presumed to be due at least in part to the molecule's shape. Simply put, a square peg will fit in a square hole while a round peg will fit in a round hole; understanding and directing biological activity is simply locating the correctly shaped peg (i.e., ligand) for the correctly shaped hole (i.e., receptor). Because of this principle of complementarity between a molecule and a receptor site, it is generally believed that molecules which have a shape known to be complementary to the receptor site, or a shape similar to a known biologically-active molecule, will have a higher probability of being biologically active with that receptor.
Biological activity between receptors and ligands is also believed to depend in part on electrostatic complementarity: that is, whether the hydrogen bond donors of the ligand are positioned near hydrogen bond receivers of the receptor, and vice versa. In such an arrangement, electrical charges of opposite sign are brought into close proximity, facilitating biological activity. It is not believed to be necessary for the receptor and ligand to have identical electrostatic potential, but rather, simply opposite signs at interacting positions. Similarly, molecules that are believed to have similar biological activity are believed to require only the same signed-charge (i.e., positive or negative) at corresponding positions. A molecule's polarity and distribution thereof is therefore useful information in understanding and creating biological activity.
In ligand-based drug design and other biochemical uses of complementarity, the underlying assumption is that a particular known biologically-active compound is complementary in shape and electrostatic potential to the desired target receptor and that the complementarity is responsible for the ligand drug's active effect. Therefore, in ligand-based design, the researcher attempts to locate other compounds having a similar shape and electrostatic potential to the known biologically-active compound in order to capitalize on them and use them as drugs. In receptor-based design, the structure of the target receptor is already generally known in atomic detail, and the goal is to identify compounds that would be complementary to and biologically active with the receptor. Receptor-based design also requires a definition of the binding-site cavity to a degree of detail on the atomic level. Both of these design methodologies therefore rely on an understanding of the shape and electrostatic potential of a particular compound, or the shape of a particular receptor, to locate new molecules having either a similar or complementary shape and electrostatic potential.
A number of methods have been devised for determining an indicator of the shape and electrostatic potential of a molecule or receptor, which can then be compared to other molecules of known shape and electrostatic potential to search for similarities or complements. Current automated programs for comparing receptors and ligands implicitly represent molecular shape via molecular mechanics energy calculations, based on the principle that only molecules that can adopt a shape complementary to a target receptor will result in a favorable binding energy. However, these methods have all met with limited degrees of success in their ability to accurately describe the shape of the molecules or receptors. This limitation on their success is due in part to the fact that they generally hinge on specifying structure. Such reliance on structure is imprecise, because many compounds with radically different composition, structure, and polarity may meet requirements for biological activity. As such, investigators using such methods based on structure must generate multiple queries largely on the basis of imprecise chemical intuition.
Many current methods for designing biologically active molecules require too much information to generate a molecule's shape. Large amounts of input information require undesirable amounts of calculation and input time. It also limits the program's use to investigators with specialized knowledge. One such computer program is Comparative Molecular Field Analysis, or COMFA™, a product of Tripos, Inc. In COMFA™, the van der Waals and electrostatic fields of molecules are sampled over a grid superimposed on the molecule or receptor site. The values of these fields at the particular grid points are then used as descriptors in a regression model. COMFA™ thus includes both molecular shape and polarity While the COMFA™ method can generally narrow a range of compounds to those which are generally similar, the COMFA™ method has distinct problems in its application. Firstly, COMFA™ uses an undesirably large amount of explicit information to encode shape, involving many grid points and geometric constraints. COMFA™ also requires the inclusion of an overlaid grid on the molecule, which generates questions of accuracy due to the effects of grid spacing and orientation of the molecules being compared.
Other ligand-based methods include the various methods for defining pharmacophore models. These implicitly represent ligand shape by incorporating some collection of hydrogen bond acceptors and donors and regions of steric bulk, and imposing inter-group distance constraints thereon. This three dimensional geometric information thus also requires a large amount of implicit information to provide the shape.
Other approaches attempt to compute topological descriptors of molecules, beginning with chemical structure or the wave function. These also often derive directly from the molecular shape. Further, methods based on chemical fingerprints generally also include implicit shape information, since only a restricted family of compounds will be compatible with the information contained in the fingerprint.
Receptor-based design strategies generally involve an explicit representation of shape derived from an atomic resolution structure of the active site. For example, UCSF DOCK methods pack the active site with spheres, producing an efficient representation of the volume available to accommodate a ligand, and combine this with positions of hydrogen bond acceptors and donors. Other docking algorithms such as FLOG, GOLD and FlexiDock use an all-atom representation of the active site, and thus represent its geometry in fine detail. Pharmacophore-like models can also be devised for receptors, and these include shape information in the same way as ligand-based models.
Thus, all of these ligand- and receptor-based programs rely undesirably on representations of structure and/or require an undue amount of informational input. It is therefore desirable for programs designing biologically active molecules to do so without relying on structure or massive amounts of input information. In addition, the information produced by current methods is difficult to efficiently encode and to use in database searching. Current methods such as those described above demand considerable computation, involving energy or distance-geometry calculations, reliance on data-rich explicit shape representations, and, oftentimes, manual alignment of ligands and receptors. For example, COMFA™ and pharmacophore methods use a large amount of explicit information to encode shape, involving many grid points or geometric constraints. The current systems match a compound to a receptor site or pharmacophore via some sort of computational simulation, involving a genetic algorithm, Monte-Carlo method, or other technique for randomly generating orientations and configurations of the ligand. COMFA™ is also problematically subjective in its requirement of alignment of the series of molecules, and is limited to scanning around 150 compounds. Further, scanning a chemical library with a pharmacophore query involves a significant amount of computation, since each molecule must be repositioned and flexed in order to determine if it can fit the model. Similarly, receptor-based strategies require many packed spheres and/or atom positions to encode the shape of the active site, and rely on scanning a chemical library with a docking program which requires many detailed calculations for each compound considered. These calculations can involve molecular mechanics computations or, at the very least, so-called “bump checks” to test shape compatibility between receptor and ligand.
All of these computations take undesirably large amounts of time and processor resources. They also require an undesirable amount of training time in chemical structure, the docking process, and many other areas; this time corresponds to often scarce financial resources. While improved efficiency in search methods has made approaches usable for screening existing chemical libraries, there is clearly no upper limit on the number of compounds that researchers would like to compare in the pursuit of new biologically active compounds. Further, the faster, and more effectively, a database can be searched, generally the faster a new and useful compound can be discovered.
It is therefore desirable to have a method that can rapidly compare shapes of large databases of compounds to each other, or to a receptor site, with minimal computation and without reliance on chemical structure, explicit 3D representations of shape, or actual ligand-receptor docking. It is further desirable that this method require no special training in chemical structure or docking. Some inroads into this methodology are illustrated by Zauhar et al., Issues and Applications in Toxicology and risk Assessment Meeting, April 2001; and Zauhar et al., ACS National Meeting, August 1999, the disclosures of which are both incorporated herein by reference.
Current methods are also not sufficiently precise for larger molecules and molecules with a variety of functional groups. This is because the attributes of different parts of a large molecule tend to cancel each other out in an analysis of the entire molecule, resulting in a featureless distribution that merely reflects the molecule's rough overall geometry. It is therefore desirable for a comparison method to be accurately applicable to large molecules and molecules with a variety of functional groups.