Proteins are linear polymers made up of 20 different naturally-occurring amino acids. The particular linear sequence of amino acid residues in a protein is said to define the protein's primary structure. In its natural environment, a protein folds into a three-dimensional structure determined by its primary structure, and by the chemical and electronic interactions is between the protein's individual amino acid constituents and the surrounding aqueous environment, which can include other biomolecules and cellular structures in addition to water.
Studies of known three-dimensional structures have led to the identification of a number of characteristic patterns that appear to be particularly stable and therefore recur within folded proteins. Formed as a result of chemical interactions between different amino acids in the protein, these patterns, which include alpha helices, beta sheets and turns, among others, are referred to as the protein's secondary structure.
The combination of α-helices, β-sheets, turns, and other structures that make up the protein's secondary structure, and the interactions between those structures, determine the protein's tertiary (or three-dimensional) structure. Because a protein's biological properties depend directly on its tertiary structure (i.e., its three-dimensional conformation), understanding that structure is crucial to understanding a protein's biological role and to applying that understanding in such areas as the treatment of disease and the design and development of new pharmaceuticals.
A protein's primary structure can be easily determined using known methods—for example, by identifying the amino acids coded for by a protein's known genetic (nucleotide) sequence. Similarly, known techniques make it relatively easy to identify a protein's secondary structure once the primary structure is determined.
Determining a protein's tertiary structure is more difficult. For some proteins, it is possible to determine tertiary structure through such techniques as x-ray crystallography, or spectroscopic methods such as fluorescence and nuclear magnetic resonance (NMR) studies. However, these techniques can be time-consuming and expensive, and not all proteins are equally amenable to structural examination by these methods.
One such class of proteins is the class of transmembrane/integral membrane proteins. Integral membrane proteins comprise 20-30% of genes (Wallin and von Heijne, 1998) in humans and other forms of life, playing an important role in processes as diverse as ion translocation, electron transfer, and transduction of extracellular signals. One of the most important classes of transmembrane (TM) proteins is the G-protein-coupled receptor (GPCR) superfamily which, upon activation by extracellular signals, initiates an intracellular chemical signal cascade to transduce, propagate, and amplify these signals. GPCRs are involved in cell communication processes and in mediating such senses as vision, smell, taste, and pain. The extracellular signals inciting this transduction are usually chemical, but for the opsin family, it is visible light (electromagnetic radiation). Malfunctions in GPCRs play a role in such diseases as ulcers, allergies, migraine, anxiety, psychosis, nocturnal heartburn, hypertension, asthma, prostatic hypertrophy, congestive heart failure, Parkinson's, schizophrenia, and glaucoma (Wilson and Bergsma, 2000). Indeed, although they comprise 3-4% (Schöneberg et al., 2002) of the human genome, the GPCR superfamily is one of the most important families of drug targets.
GPCRs share a predicted seven-transmembrane helix structure and the ability to activate a G-protein in response to ligand binding. Their natural ligands range from peptide and non-peptide neurotransmitters, hormones, and growth factors to odorants and light. The members of the GPCR superfamily which act through heterotrimeric G-proteins have been classified into six clans (see below). Within a class of GPCRs (for example, adrenergic receptors) there are often several subtypes (for example, nine for adrenergic receptors) all responding to the same endogenous ligand (epinephrine and norepinephrine for adrenergic receptors), but having very different functions in various cells. In addition, many different types of GPCRs are similar enough that they are affected by the antagonists or agonists for other types (e.g., among adrenergic, dopamine, serotonin, and histamine receptors), leading often to undesirable side effects. This makes it difficult to develop drugs to a particular subtype without side effects resulting from cross-reactivity to other subtypes. To design such subtype-specific drugs it is essential to use structure-based methods.
Extensive protein sequence analyses on certain GPCRs has revealed a common topology consisting of a membrane-spanning seven-helix bundle, which is believed to accommodate the binding site for low molecular weight ligands. However, although much effort has been put into elucidating the structure of GPCRs, only a very small number of complete 3D structures of transmembrane proteins are known from experiments (e.g., bacteriorhodopsin and bovine rhodopsin). In fact, there is no atomic-level structure available for any human GPCRs. Consequently, design of subtype-specific drugs for GPCR targets is a very tedious empirical process, often leading to drugs with undesirable side effects.
The difficulty in obtaining three-dimensional structures for GPCRs is obtaining high-quality crystals of these membrane-bound proteins sufficient to obtain high-resolution x-ray diffraction data, and the difficulty of using NMR to determine structure on such membrane-bound systems. For globular proteins, there have been significant advances in predicting the three-dimensional structures by using sequence homologies to families of known structures (Marti-Renom et al., 2000); however, this is not practical for GPCRs, inasmuch as a high-resolution crystal structure is available for only one GPCR, bovine rhodopsin—which has low homology (<35%) to most GPCRs of pharmacological interest. Thus, there is a need for modeling techniques that predict structure and functional characteristics of the members of this class of proteins at a molecular level, especially as a first step, for modeling techniques that predict structural elements such as transmembrane regions.