1. Field of the Invention
The invention relates to methods and systems for constructing models to predict molecular activity, and further relates to a model for predicting protein binding.
2. Description of the Related Art
Drugs may bind to a variety of components in the blood, including albumin, a1-acid glyciprotein (AAG), lipoproteins, immunoglobulins, sex hormone binding globulins, and erythrocytes. Drugs which are ≧90% protein bound in human serum are generally considered “highly bound”. Some foreknowledge of the protein binding characteristics of a molecule would help provide a better estimate of the pharmacodynamics and pharmacokinetics of the molecule. Clearance depends significantly upon volume of distribution, which in turn depends upon the fraction of the drug in plasma which is unbound, fu. Highly protein bound drugs, having low fu, have lower free concentrations because the drug-protein complex cannot diffuse to reach the receptor, and this lowers the pharacodynamic response. One example is that the fu of phenytoin is more useful than the total plasma concentration of phenytoin for discriminating toxic responses to that drug. In addition, hepatic extraction is directly proportional to fu. Glomerular filtration by the kidney does not occur for highly protein bound drugs, as the drug-protein complexes are too large to be filtered. The maximum oral bioavailability is directly related to the hepatic extraction ratio, which is in part dependent upon the fu. Disease states causing significant drop in serum albumin, e.g. nephrotic syndrome, where serum albumin concentrations are halved, cause a corresponding 2-fold decrease in half-life for clofibrate. Competitive displacement of a drug from its protein binding site by other highly protein bound drugs has been theorized to lead to adverse events due to the increase in plasma concentration of the displaced drug or altered pharmacokinetics.
A variety of techniques have been developed for predicting protein binding. Lipophilicity has been repeatedly found to be significant factor in protein binding. This is not surprising, because some lipophilic character is usually required for interaction at receptor sites in proteins. Thus, high logP (octanol-water partition coefficient) has been found to be associated with high protein binding. Another predictive model is based on performing structural comparisons between a molecule with unknown behavior and a set of “marker molecules” having known behavior. A method of this type is known as the LLC hashkey method. The hashkey method randomly selects a relatively small set of molecules (20–200) to produce a molecular representation of the entire chemical space of interest. Similarities derived from 3-D molecular surface properties are computed from all molecules of interest to the chosen hashkey molecules, and properties are predicted using hashkey vectors and some form of computational model, e.g., a neural network or KNN.
Historically, efforts to build predictive models for protein binding have been only partially successful. Austel and Kutter reviewed 39 structure/activity prediction models for protein binding and concluded that the models “have shown that within a series of closely related compounds protein binding increases with lipophilicity. Differences between individual structural types are not well explained and cannot be predicted.” (Austel, V.; Kutter, E. Absorption, Distribution, and Metabolism of Drugs. In Quantitative Structure-Activity Relationships of Drugs; Topliss, J. G., Ed.; Academic Press: New York, 1983, pp 437–496.) What is needed a more accurate model for predicting molecule behavior such as protein binding.