This invention relates to method for predicting properties of a chemical compound from its structure, by which it is possible to design a new chemical structure having a desired property.
Today, the industrial need to find an effective molecular design technique is growing rapidly. The probability of success in the conventional screening process to discover a useful compound is extremely low. Even the task of selecting the best compound in a given homologous group requires an astronomical number of trials. .alpha.-Naphthoic acid, for example, has seven replaceable positions as shown below. ##STR1##
The number of possible compounds by replacing these positions with 20 commonly used substituents amounts to 20.sup.7 (=1.28.times.10.sup.9). Considering the fact that the number of compounds registered each year to the chemical abstracts is in the order of 10.sup.5, the time and money for this task is at the impossible level. In reality, experiences, intuition, ease of synthesis and various kinds of mechanistic knowledge allow some selections rather than random approach, but the predictability of the activity from the chemical structure is generally very poor.
Any practical property of compounds, sometimes more aptly called activity, is multivariate in nature. This recognition led many people to apply multivariate analysis techniques for the problems of the structure-activity analysis, but at present no universal technique is known.
To classify the prior art of this field, it is customary to follow Cramer's example (R. D. Cramer, et al., Chemical Society Reviews 3, 273 (1974)): (1) lead-generation techniques and (2) lead-optimization techniques. The former group of techniques attempt to predict a new lead compound from known results. At present, there is no industrially dependable prior art in this field. The latter prior art aims only at predicting the best compound in the homologs after a lead was discovered. Although some successful cases have been reported, they naturally lack the universal applicability. (Y. C. Young, Journal of Medicinal Chemistry, 24, 230 (1981)).
The common feature of the prior arts is that they are method-oriented rather than problem-oriented. The models developed elsewhere to explain certain phenomena have been rather mechanically applied to more complex systems. The following comment by one of the experts in this field describes the state of this art (R. Cramer, Chemical Technology, 744 (1980)); "Let's not emulate the drunk who searched for his key under lamppost, where he could see, rather than the dark corner, where he lost it!"
This invention contains two new elements: (1) Use of control chart to define the target, and (2) Use of the principle of equilibrium as a powerful means to find the cause-effect relationship. Unlike many other prior arts, this process is not designed to "explain" the given phenomena, but it relies on a well-established engineering art of problem-solving--construction and use of control chart. (W. A. Shewhart, "Economic Control of Quality of Manufacturing Products", Chap. XX, P. 301, D. van Nostrand Co., Inc., New York 1931). Such an engineering art is based on the assumption that a desirable outcome is the consequence of selecting optimum causes. It tacitly implies that an engineer has to search for causes when the cause-effect relations are not readily apparent. In applying the control chart technique to the molecular design, it will not be complete unless a technique of the cause-finding is established. Now, in the following section I will described this part of art in detail. A practical property or activity of a compound is measured with a particular scale suitable to the object. Such a practical measure, such as LD.sub.50 and ED.sub.50, is usually broken down to several elementary properties in an attempt to define the cause-effect relationships. These elementary properties, such as acidity, lipophilicity, electronegativity and so on, are less complex, are well-defined and has a universal meaning.
Because these elementary properties themselves are not generally predictable from the chemical structures, such an approach has an intrinsic limitation as a universal technique of molecular design. To eliminate this limitation, some people have been attempting to correlate a practical property directly to structural parameters (B. R. Kowalski & C. F. Bender, Journal of the American Chemical Society, 94, 5632 (1972); A. J. Stuper and P. C. Jurs, ibid., 97, 182 (1974)). Although such an approach has an advantage in offering the direct structure-activity relationships, the choice of structural parameters are rather arbitrary and, because of this, this approach has not met with a significant success. For such an approach to be successful, the arbitrariness of the choice of parameters must be minimized by the introduction of some new principles. I have done this by "the principle of balance".
Any practically useful compound should have a certain balance of elementary properties. Because the practical working environments of a compound are generally complex, the compound of high activity is expected to satisfy more than one requirements. According to this principle, the activity will decrease if the desirable balance of elementary properties is displaced. This principle has been proven in practical world in a variety of ways. The art of control chart, as mentioned earlier, seeks the optimum ranges of causes to get a desired effect, and it worked well. In a biological field, the invariance of partition coefficients of highly active compounds, in spite of the variety of structures, has been well recognized. (C. Hansch, Chemical Technology, 120 (1977)).
Because an elementary property of a compound is, in turn, related to a certain structural features, a highly active compound should, then, possess a certain equilibrium of structural features. The control chart technique is best to express such a structural equilibrium with a certain allowance. Although the concept of control chart is a product of engineering wisdom, the idea can be safely applied to the biological problem. Homeostasis, a principle of equilibrium, should control the requirements of biologically active compounds. As long as the cause-effect relationships are expressed by the control chart, this process has no limitation in its applicability.
A thorough examination of structural parameters and the correlation coefficients among them is the essential preparatory step to construct a reliable control chart. Once the chart is constructed, the structures to be designed should fall within the control limit of the chart, just as the reaction temperatures and pressures should be kept within the optimal ranges to obtain a desired product.
The control limit can be shown either on paper or in terms of the Mahalanobis' generalized distance from the center of the desirable zone. The former method is advantageous when a particular compound outside of the limit is to be modified to obtain a higher activity, because the deviation from the target area is visibly grasped, but it has a natural limitation of dimension. The Mahalanobis' generalized distance is a convenient scale in sorting out hopeful candidates even in the case of multivariate control chart.
In practice, this process involves the following steps:
(1) A set of compounds of known structures and activities are grouped by the activity levels. Each compound is, then, converted to a series of numerals in a predetermined format. Each of the numerals describes an aspect of the structure called structural parameters. These values are fed into a computer for the processing described below. This coding process is not particularly new, except the choice of the parameters.
(2) Using the data set prepared above, the correlation coefficients of all combinations of two parameters are calculated and are compared among different activity groups. There are, nearly always, such combinations of parameters that give very high correlation coefficients for the most active group, whereas those for other groups are significantly lower. This monopoly of high correlation by the most active group indicates that this is one of the required structural equilibria for the highest activity. When the structural parameters x.sub.i and x.sub.j are the case, an estimated equilibrium constant a.sub.ij is expressed by a.sub.ij =(x.sub.j -b.sub.ij)/x.sub.i, wherein b.sub.ij is a correction factor. This equilibrium equation is derived from the regression line x.sub.j =a.sub.ij x.sub.i +b.sub.ij which is obtained for the group of compounds with the desired property. The value (x.sub.j -b.sub.ij)/x.sub.i of the most active group is nearly constant around the value a.sub.ij, while the values of other groups vary widely.
The value a.sub.ij is the estimated equilibrium constant of two structural parameters x.sub.i and x.sub.j, which reflects a certain equilibrium of elementary properties for the highest activity. This simple process gives a new light to the problem and, in fact, creates a new structural parameter. This process has not been tried.
(3) By using all parameters prepared in (1) and (2), the control charts are produced. This is nothing but plotting data either on paper or in the n-dimensional space by the aid of computer and determine from the plots the structural outer limit of the most active group. The discrimination power is the only criterion to compare among several control charts.
Thus, the construction and selection of the control charts are complete. But this process is repeated when a new addition of data becomes available or when better structural parameters are suggested.