A. Field of the Invention
This invention relates generally to a computer implemented method for expanding the range/diversity of synthesizable chemical structures that can be searched for three dimensional shapes similar in shape to molecular fragments of known pharmacologically interesting molecules. More specifically, a forward synthetic method is described that utilizes recursive application of established organic chemical reactions to derive synthons from available reagents. The generated synthons are characterized with a molecular structural descriptor possessing a neighborhood property and searched for three dimensional shape similarity to molecular fragments derived from query molecules. Identified synthons can be assembled into molecules possessing the same three dimensional shape as the pharmacological molecule of interest.
B. Description of Related Art
1. Computer-Aided Synthesis:
“Pharmacological Chemical space,” the distinct structures with a molecular weight less than approximately 1000 Daltons and in which the bonds to every atom obey standard valency rules, numbers over 1040 molecules, according to one serious estimate (Weininger)1. Most of these structures will never be synthetically available, either because the structure is too energetically unfavorable compared with its accessible decompositions or isomerizations, or more often because the structure is too difficult to synthesize compared to its potential benefits. The decision to synthesize any particular structure results from an assessment that the potential value of the structure is likely to exceed the costs of the attempt. Throughout this patent document, the term “costs” is intended to reference not only monetary expense, but also to reference the extent of time required, the level of effort required, and the consequences of diverting money, time, and effort away from other possible projects. Several computer-aided organic synthesis projects have had the estimation of such costs as their goal. Given a desired structure, the question was asked “how might this desirable (target) structure be synthesized and what is the cost?”
The original such project LHASA, begun in the late 1960's, introduced “retrosynthetic analysis” as the methodology for achieving this goal. As the name implies, retrosynthetic analysis proceeds in the opposite direction from actual laboratory synthesis, beginning by identifying the most promising building blocks (“synthons”) and chemical reactions for producing the desirable end “target” structure, and then reapplying the same approach to each of the resulting synthons. Conceptually retrosynthetic analysis was a huge success, having become the major teaching paradigm for synthetic organic chemistry and winning a Nobel Prize in chemistry for its originator E. J. Corey. A recent article by Todd2 traces the history and development of the field of computer-aided synthesis. LHASA was developed further over the years including extensions to deal with forward synthetic enzymatic reactions but it was never envisaged as a means to generate synthons that would not be utilized in a reaction sequence directed at a specific synthetic target. To date, there has not been a perceived need in the prior art to use computer-aided synthesis to generate a variety of synthons with the object of achieving structures with a broader range of three-dimensional shapes. Not surprisingly, computer-aided organic synthesis projects subsequent to LHASA have operated retrosynthetically, their only use of starting materials being as input to answer the question: “have we reached the desired synthetic target yet?”
2. Shape Based Comparison:
a. Molecules Viewed as Assemblies of Parts:
In pharmaceutical drug development, the situation frequently arises where it is desirable to make some alteration to a lead compound. The alteration may be simple or it might necessitate the replacement of significant parts of the molecule. In order to retain the biological specificity of the lead compound, any replacement part should have a similar three dimensional shape. When comparing the three dimensional shape of one molecule to another, in computational chemistry it is convenient to view the molecules as assemblies of constituent parts. Typically, a molecule is viewed as an assembly of fragments where the fragments are derived by severing bonds within the molecule in a consistent manner. Fragments are a useful way to deconstruct the three dimensional shape of molecules so that similarly shaped parts may be identified. Similarity of shape of the whole molecules (such as a lead compound and a possible alternate compound) can then be determined by comparing the shapes of the individual molecular fragments. A simple example would be two molecules A and B which could each be fragmented into two roughly similar parts: A1 and A2, and B1 and B2. The shape of A1 would be compared to the shape of B1. The shape of A2 would be compared to the shape of B2. Alternatively, rather than comparing fragments of two molecules, it may be desirable to compare fragments from a lead compound with molecular structures (fragments) derived from available reagents or which could be independently synthesized.
Computer representations of the fragmented parts of a lead molecule or fragments derived from available reagents or that could be independently synthesized each retain an open valence where they were or could be “attached” to form a whole molecule. This approach has two major advantages. First, the open valence provides a reference point that enables fragments to be commonly aligned. Alignment is necessary since shape similarity implies that the atoms of each fragment occupy similar positions in three dimensional space. Second, if the compared fragment does not share the same shape as the lead compound derived fragment (within some measure of similarity), it is unlikely that the substitution of the compared fragment for the fragment from the lead compound would produce an active compound. As will be seen below, this simple dissimilarity searching criteria can eliminate large numbers of possible fragments very quickly and permits very high search speeds through enormous fragment databases.
b. Advances in Validating Metrics:
While the goal of shape comparison existed in the prior art and a variety of shape descriptors were tried, no method was known that could validate whether a descriptor (molecular structural metric) described the three dimensional molecular shape in a manner that was biologically relevant. In this environment shape comparisons could be performed but one could not know if the results were meaningful. Over the past several years a new method of metric validation for use in drug discovery has become available which has opened up to searching, for useful and/or improved variations of pharmacological compounds, the large universe of possible organic chemical compounds. The fundamental keys to unlocking this possibility were disclosed in U.S. Pat. No. 6,185,506. The first key was the development of a methodology (the “Patterson Plot”) for determining whether a molecular structural descriptor (metric) was “valid”, that is; whether it described molecules in such a way that the descriptor values properly reflected the likely biological activity of the molecules. Of course, validity in this context reflects a high probability not a certainty. Once a validation methodology was known, available descriptors were evaluated and generally found wanting.
c. Development of Topomer Metric:
The second key was the development of a valid descriptor that properly reflected the three dimensional shape of molecular parts. As noted, a valid molecular structural shape descriptor had been an elusive goal of pharmacological research since it was understood that the three-dimensional shape of a ligand molecule had a great deal to do with the molecule's ability to bind to a receptor in a lock and key type arrangement. Previous work had focused on trying to determine which, of the thousands of possible conformations that a small molecule can attain, was likely to be the conformation that was critical for interaction with a receptor. Most methodological development in the 3D modeling of chemical structures for pharmacological research was aimed, quite understandably, toward greater physicochemical realism. In some approaches, statistical shape averages were employed, in others gross shape estimates were employed, while in others key pharmacophoric features were emphasized. Some successful approaches employed knowledge gained from x-ray structures of ligand-protein binding. However, the real world physicochemical reality is that biologically interesting molecules exhibit an intractable multiplicity of shapes and states. In practical applications, the various summarizations employed were so approximate as to perhaps be self-defeating. In addition, no general methodology was applicable across the broad range of chemical structures and activities.
The answer to discovering a valid descriptor that properly reflected the three dimensional shape of molecular parts turned out to be counter intuitive. Rather than trying to determine the most likely conformations of molecules, it was discovered that the alternative goal, consistency of alignment, rather than realism, in the positioning of similar structural features into similar regions of an arbitrary geometric space was the answer. The development of topomers, molecular fragments (molecular structures having an open valence [attachment bond] at least one position) aligned according to a deterministic set of rules that produce absolute configuration, conformation, and orientation, was taught in U.S. Pat. No. 6,185,506 and further extended in U.S. Pat. No. 6,240,374.
The method of defining the shape of the topomers so that useful shape comparisons could be made was adapted from the CoMFA procedure described in U.S. Pat. Nos. 5,025,388 and 5,307,287. In CoMFA, the steric fields surrounding molecules were demonstrated to be an effective and realistic determinant of the molecular shape of the molecules under consideration. Using a similar approach, steric fields around topomerically aligned molecule fragments were demonstrated to form a validated molecular structural descriptor. The disclosures of U.S. Pat. No. 5,025,388, U.S. Pat. No. 5,307,287, U.S. Pat. No. 6,185,506, and U.S. Pat. No. 6,240,374 and their attached software appendices are incorporated into this patent document as if fully set forth herein.
d. Topomer Based Shape Comparison:
The use of the steric fields around topomerically aligned fragments as a molecular structural descriptor (metric) permits the shapes of the fragments to be compared. Since the metric is valid, similar topomeric shapes of two fragments implies a high likelihood of similar biological activity. Shape searching using the steric fields of topomerically aligned fragments initially advanced the development of combinatorial libraries (U.S. Pat. No. 6,185,506). Subsequently, it was realized that the shapes of fragments derived from reagents used in combinatorial syntheses could be precomputed and stored in a virtual library along with other information such as the chemical reactions in which the reagents participated (U.S. Pat. No. 6,240,374). Given a query molecule with a known activity that could be broken into fragments, the shapes of those fragments (topomerically aligned and characterized with their steric fields) could be searched in the virtual library of molecular fragments to identify other chemical structures of similar shape that could be substituted for the query fragments and would have a high likelihood of having a similar biological activity. Topomer searching finds biological equivalent structures. More recently, it has been shown3 that topomerically aligned fragments can be utilized in the CoMFA methodology to generate a valid CoMFA. The techniques of the present invention extend even further the usefulness of topomers.