Field of the Invention
This invention relates generally to a computer implemented drug discovery method. More specifically, the disclosed method permits a user to specify a three dimensional representation of a template molecule, which may be derived from binding data, crystallographic data, modeling data, or any other source, to align additional molecules to generate a CoMFA (comparative molecular field analysis) QSAR.
Description of Related Art
In U.S. Pat. No. 5,025,388 and U.S. Pat. No. 5,307,287 Comparative Molecular Field Analysis (CoMFA), a three-dimensional quantitative structure activity relationship (3D QSAR) technique was introduced. The CoMFA technique permits a quantitative correlation of the observed activities of several molecules active in the same biological assay to the shape characteristics of those molecules. In CoMFA, each molecule in the activity series is aligned in a three dimensional grid and its shape characterized by the steric and electrostatic interactions energies between a probe and the atoms of the molecule at each grid point. The interaction energies are associated with the observed/measured activity of the molecule in a CoMFA table and a partial least squares (PLS) statistical analysis with validation is performed.
The resulting analysis provides coefficients of each grid location term in the table that reflects that position's contribution to the observed activity. Using the data, it is possible to identify and observe those volumes of the molecule (arrangement of atoms) associated with either increased or decreased activity. Based on the identified coefficients, it is also possible to estimate the likely biological activity of a molecule for which no activity has yet been determined in an assay.
However, CoMFA requires great care in the selection of molecular conformations and the proper alignment of the series of molecules, but, nevertheless, the technique demonstrated the power of utilizing three dimensional shape descriptors in molecular analysis, and CoMFA has become a fundamental method in computational chemistry.
Since the introduction of CoMFA many different procedures have been utilized to align the molecules of the activity series in the three dimensional grid. Proper alignment among all the molecules in the activity series is extremely important since any misalignment results in differences in the steric and electrostatic interaction energy shape descriptors that would not be related to the actual shape characteristics responsible for activity. Ideally, the molecules in the activity series could be aligned to an x-ray crystallographic structure determined for one of the molecules. However, for many active molecules, such a structure is not available. Alternatively, the series could be aligned to a conformation derived from a proposed binding site structure. The difficulties with whole molecule alignments have led to the development of alternative methods of examining and relating differences in molecular structure to activity.
Alternate 3D representations of molecular fragments, such as topomerically aligned fragments, have been developed and have been successfully employed. These approaches utilize fragmentation of the molecules in an activity series at acyclic bonds, topomerically aligning the fragments, characterizing the shape of the fragments utilizing the steric and electrostatic interaction energies as used in CoMFA, and comparison to the shape of fragments (similarly aligned and characterized by steric and electrostatic interaction energies) derived from libraries of molecular compounds. In particular, the 3D QSAR technique known as Topomeric CoMFA has been highly successful when used in conjunction with molecular fragments derived from a combinatorial library. It has been discovered that the Topomeric CoMFA approach could be extended to searching and deriving predicted activities from molecular fragments generated from large library assemblages of molecules that can be commercially obtained that do not derive from combinatorial syntheses and come from many different sources and syntheses, some known, some unknown. These libraries may, and do typically, contain natural products. Fragmenting at all the acyclic bonds in these molecules produces a much greater shape variation as well as number of molecular fragments than found by fragmenting the molecules of a combinatorial library. This approach has been taught in U.S. patent application Ser. No. 12/045,511 using a fragmentation on-the-fly technique first taught in U.S. Pat. No. 7,330,793.
However, as noted in U.S. Pat. No. 7,329,222 the use of a rule based (topomeric) procedure for aligning molecular fragments that lies at the heart of the Topomeric CoMFA methodology is not always applicable and may result in 3D fragment conformations that do not approximate those assumed by the fragment in an active molecule. Importantly, there are many cases where it is believed that an alternative geometric alignment, based on knowledge about receptor site geometry gleaned from other sources, such as x-ray studies or ligand binding, might be more useful in computing a 3D QSAR such as CoMFA. Alternatively, it may be desirable to seek alignments that overlay fragments from two or more structurally non-congeneric sets that may, for example, be known to bind to the same receptor. To handle these situations, an alternative alignment method was devised which could align such structurally varied fragments to some user specified geometry or geometries. This alignment method supplants the topomeric alignment method used with fragments in U.S. Pat. No. 7,330,793, and U.S. patent application Ser. No. 12/045,511. However, construction and use of the CoMFA data table proceeds as taught in those patents.
U.S. Pat. No. 8,504,302 teaches a new alignment method which permits alignments of molecular fragments to one or more user supplied templates that specify the types and three dimensional positions of all the atoms in one or more molecular fragments. Fragmentation of the query molecules that comprise the activity set as well as molecules examined in the database libraries is performed as taught in the cited patent documents. The fragment template to which alignment is made need not come from any fragment derived from a molecule in a congeneric series but only from a template molecule selected by the user. However, any fragment from a congeneric series could be used as a template fragment as well. In U.S. Pat. No. 8,504,302, reference to the template or template atoms means the externally specified 3D arrangement of atoms and their types. Reference to the candidate or candidate atoms means the arrangement of atoms and their types found in the fragments derived from the molecules in a congeneric series. In the method atom-by-atom matches (identical atoms) between the template fragment and the candidate fragment are identified by serial/sequential traversals that start at the fragment root and end wherever no more matches exist along any given branch. To align the candidate fragment, for atoms in the candidate fragment (excepting partial matches within rings) that match atoms in the template fragment, the coordinates of the matching template atoms are assigned to the candidate atoms. Once the common alignment is established, a useful CoMFA analysis may be performed.