Why certain chemical structures and not others are present in nature has been a recurring question raised by scientists since the first organic natural products were characterized. Of equal interest has been elucidating what structural features within any given class of organic molecules are responsible for biological activity. Historically, the lack of satisfactory answers to both questions has relegated the development of biologically active molecules either to serendipity or to exhaustive synthesis and biological testing of large numbers of compounds. This frustration is particularly evident in the pharmaceutical industry where the development of drug agonists and antagonists is often time consuming, tedious and expensive.
This picture is beginning to change as more information is derived from modem molecular modeling techniques including characterization of the active sites in enzymes and the ligand binding sites in receptors. Over the past 15 years, another approach has emerged based upon a series of discoveries made with molecular models, wherein biologically active small molecules have been found to possess complementary stereochemical relationships with gene structure. This approach was first described in U.S. Pat. No. 4,461,619 to Hendry, et al., which is incorporated herein by reference. This simple molecular modeling technology was developed from observations first reported in 1977 of structural relationships between small molecules and nucleic acids, as described by Hendry, et al., J, Steroid Biochem. Molec. Biol. 42:659-670 (1992); Copland, et al., J. Steroid Biochem. Molec. Biol. 46:451-462 (1993); Hendry and Mahesh, J. Steroid Biochem. Molec. Biol. 41:647-651 (1992); Witham and Hendry, J. Theor. Biol. 155:55-67 (1992); Hendry and Mahesh, J. Steroid Biochem. Molec. Biol. 39:133-146 (1991); Hendry, J. Steroid Biochem. 31:493-523 (1988); Lehner, et al., Molec. Endocrinol. 1:377-387 (1987); Hendry, et al., J. Steroid Biochem. 24:843-852 (1986); Uberoi, et al., Steroids 45:325-340 (1985); Bransome, et al., J. Theor. Biol. 112:97-108 (1985); Hendry, et al., Proc. Natl. Acad. Sci. USA 78:7440-7444 (1981); and Hendry, et al., Perspect. Biol. Med. 27:623-651 (1984), all of which are incorporated herein by reference.
The essential ingredient of all genes is a single, well defined polymer, deoxyribonucleic acid (DNA). DNA is a remarkably uncomplicated molecule composed of recurring sugar-phosphate units attached to one of four possible bases: adenine (A), thymine (T), cytosine (C) or guanine (G). The simplicity of gene structure is further evident in the Watson and Crick base pairing scheme of double-stranded DNA (A with T and C with G), and the helical chirality (handedness) dictated by the absolute configuration of the sugar D-deoxyribose. Gene structure could conceivably be composed of many other chemical units, for example, other sugar stereoisomers such as L-deoxyribose or sugar homologs related to D-glucose.
The products of gene structure, proteins, are also simple, ubiquitous molecules. Nature limits the structure of proteins by constructing them from only twenty basic units, the amino acids; protein chirality is constrained by the absolute L-configuration of the amino acids. As in the case of nucleic acid subunits, a wide range of structural alternatives are possible for protein amino acids. Examples include changes in the chirality of a given amino acid side chain (e.g., D-isoleucine), rearrangements in the pattern of atoms (e.g., the t-butyl isomer of isoleucine) or the addition of atoms (e.g., pipecolic acid, a homologue of proline).
Structural constraints are also evident in the stereochemistry of low molecular weight natural products. Particularly conspicuous are limitations imposed by nature on the number, size, shape, elemental composition, and chirality of biologically active small molecules. For example, the pervasive neurotransmitters histamine and serotonin are unique in that alternative structures with changes in the position or composition of heteroatoms and/or ring patterns generally do not exist in nature. Similarly, many small molecular weight hormones are few in number, have recurring structural patterns and possess a single absolute chirality.
The source of the pervasive occurrence of physicochemical constraints on the structure of naturally occurring small molecules lies directly in the structure of the proteins which govern both their biosynthesis and bioactivity, i.e., enzymes and receptors, respectively. Ultimately, however, this stereochemical information is contained in the genes. According to the basic tenants of molecular biology, the information in DNA is replicated with remarkable precision and fidelity into newly synthesized DNA. It is also transcribed into RNA and subsequently translated into protein.
This scenario, however, presents an apparent paradox. While the genetic template ultimately directs which proteins and small molecules are synthesized, as well as which proteins and small molecules will interact with each other, the undirectional flow of genetic information during translation suggests that DNA structure performs this function without recognizing the structure of the small molecule. With few exceptions, such as certain antibiotics which bind directly to DNA and block transcription, small molecules are not considered to recognize or interact with the genetic template. Moreover, the structures of the molecules that are biosynthesized are thought to be unrelated to the structure of the genes.
In the initial search for structural relationships between biologically active natural products and DNA, it became apparent that the two-dimensional structures of DNA base pairs were analogous to many classes of small molecules, including gibberellic acid, a phytohormone; benzo a! pyrene oxide, a carcinogen; the prostaglandin PGE.sub.2 ; morphine, a narcotic; estradiol, a hormone; riboflavin, vitamin B.sub.12 ; serotonin, a neurotransmitter; and actinomycin, an antibiotic. In addition to similarities in size and shape, numerous small molecules contained donor/acceptor functional groups at locations where hydrogen bonds occurred between the base pairs. When overlaid on the base pairs, some compounds, such as the plant hormone gibberellic acid, the steroid hormone estradiol, and prostaglandins, contained heteroatoms separated by internuclear distances similar to that of phosphate oxygens on adjacent strands of double-stranded DNA. This was particularly evident in functional groups attached at the 3 and 17.beta. positions of the steroids.
Using three dimensional Corey-Pauling-Koltun (CPK) space filling models, it became apparent that there were spaces between base pairs in partially unwound DNA that could accommodate a variety of small molecules. For example, estradiol could be inserted between base pairs in DNA, and the hydroxyl groups at 3 and 17.beta. of estradiol were positioned such that they could form hydrogen bonds to phosphate oxygens on adjacent strands of DNA. Other steroids, including testosterone and progesterone, were also capable of stereochemical insertion between base pairs. In each case, complementary donor/acceptor linkages could be formed and the steroid conformed well to the topography of the double helix. Attempts to insert any of the non-naturally occurring steroid enantiomers into DNA resulted in poor fit in that donor/acceptor linkages were strained or could not form, and/or the overall shape of the molecules was incompatible with the helical topography of the DNA.
Certain synthetic compounds with hormonal activity can also be accommodated within the DNA; in many cases, the fit of synthetic compounds such as diethylstilbestrol mimicked that of the natural hormone. In addition to mammalian steroids, prostaglandins, the insect hormone ecdysone and several phytohormones were also capable of stereochemical insertion and "recognition" by the double helix. In the case of the plant hormone gibberellic acid, four stereospecific hydrogen bonds could be formed to donor/acceptor positions on the DNA. As with the steroids, only the naturally occurring enantiomer of gibberellic acid conformed to the topography of the double helix.
One conclusion drawn from these studies is that certain chemical shapes, coupled with heteroatom positioning compatible with that of the phosphate backbone of DNA and hydrogen bond positions of the base-pair template, potentiate partial or complete recognition between biologically active molecules and DNA.
While it was possible to form complexes between DNA and a variety of molecules, amino acids did not initially show any clear accommodation to the space between base pairs. Certain compounds derived from amino acids, for example, neurotransmitters, fit into related sites.
These relationships have been described as a stereochemical logic associated with gene structure. The stereochemical logic is defined as those unique features of nucleic acid structure which ultimately dictate constraints on molecular structure, function, metabolism, and biologic activity.
The use of molecular modeling as a tool to study organic structure has dramatically increased due to the advent of computer graphics. Not only is it possible to view molecules on computer screens in three dimensions but it is also feasible to examine the interactions of ligands with various macromolecules such as enzymes and receptors, as reviewed by Borman, Chem. Eng. News 70:18-26 (1992). An almost baffling array of software and hardware is now available and virtually all major pharmaceutical companies have computer modeling groups which are devoted to drug design.
Modem methods of drug design include studies which focus on the binding of a molecule to a protein such as a polypeptide ligand for a receptor, or a steroid such as an estrogen or progesterone for a receptor. Similarly, drugs can be designed based upon the interaction of substrates with various enzymes. For the most part, however, binding sites in proteins have been difficult to characterize. There are many situations where other mechanisms must be involved to explain the feedback between protein regulation and regulation of gene expression.
What is needed is a method for accurately predicting the biological activity of a given compound. The method should be easy to perform and should be able to predict both agonist and antagonist activity.