Introduction to Cytochrome P450
Cytochrome P450s (CYP450) form a very large and complex gene superfamily of hemeproteins that metabolise physiologically important compounds in many species of microorganisms, plants and animals. Cytochrome P450s are important in the oxidative, peroxidative and reductive metabolism of numerous and diverse endogenous compounds such as steroids, bile, fatty acids, prostaglandins, leukotrienes, retinoids and lipids. Many of these enzymes also metabolise a wide range of xenobiotics including drugs, environmental compounds and pollutants. Their involvement in drug metabolism is extensive, it is estimated that 50% of all known drugs are affected in some way by the action of CYP450 enzymes. Significant resource is employed by the pharmaceutical industry to optimise drug candidates in order to avoid their detrimental interactions with the CYP450 enzymes. Another level of complication results from the fact that these enzymes exhibit different tissue distributions and polymorphisms between individuals and ethnic populations
Most mammalian P450s are located in the liver, but other organs and tissues have high concentrations of certain cytochrome P450s, including the intestinal wall, lung, kidney, adrenal cortex and nasal epithelium. Mammals have about 50 unique CYP450 genes and each family member is 45–55 KDa in size and contains a heme moiety that catalyses a two-electron activation of oxygen. The source of electrons may be used to classify CYP450s. Those that receive electrons in a three protein chain in which electrons flow from a flavin adenine dinucleotide (FAD) containing reductase, to an iron-sulphur protein, and then to P450 belong to the group of class I P450s, and include most of the bacterial enzymes. Class II P450s receive electrons from a reductase containing both FAD and flavin mononucleotide (FMN), and comprise the microsomal P450s that are the main culprits of drug metabolism. The mammalian microsomal cytochrome P450s are integral membrane proteins anchored by an N-terminal transmembrane spanning α-helix. They are inserted in the membrane of the endoplasmic reticulum by a short, highly hydrophobic N-terminal segment that acts as a non-cleavable signal sequence for insertion into the membrane. The remainder of the mammalian cytochrome P450 protein is a globular structure that protrudes into the cytoplasmic space. Hence, the bulk of the enzyme faces the cytoplasmic surface of the lipid bilayer. P450s require other membranous enzymatic components for activity including the flavoprotein NADPH-cytochrome P450 oxidoreductase and, in some cases, cytochrome b5. A single cytochrome P450 oxidoreductase supports the activity of all the mammalian microsomal enzymes by interacting directly with the P450s and transferring the required two electrons from NADPH. Cytochrome P450s are able to incorporate one of the two oxygen atoms of an O2 molecule into a broad variety of substrates with concomitant reduction of the other oxygen atom by two electrons to H2O. Cytochrome P450 are known to catalyse hydroxylations, epoxidation, N—, S—, and O-dealkylations, N-oxidations, sulfoxidations, dehalogenations, and other reactions.
The genes of the P450 superfamily have been categorized by Nelson et al (Pharmacogenetics, 6; 1–42, 1996) who proposed a systematic nomenclature for the family members. This nomenclature is used widely in the art, and is adopted herein. Nelson et al provide cross-references to sequence database entries for P450 sequences.
Homo sapiens has 17 cytochrome P450 gene families and 42 subfamilies that total more than 50 sequenced isoforms. Cytochrome P450s from families 1, 2 and 3 constitute the major pathways for drug metabolism. Many drugs rely on hepatic metabolism by cytochrome P450s for clearance from the circulation and for pharmacological inactivation. Conversely, some drugs have to be converted in the body to their pharmacologically active metabolites by P450s. Many promising lead compounds are terminated in the development phase due to their interaction with one or more P450s. One of the greatest problems in drug discovery is the prediction of the role of cytochrome P450s on the metabolism or modification of drug leads. Early detection of metabolic problems associated with a chemical lead series is of paramount importance for the pharmaceutical industry. Obtaining crystal structures of the main human drug metabolising cytochrome P450s would be highly valuable for drug design, as this would provide detailed information on how P450 enzymes recognize drug molecules and the mode of drug binding. This in turn would allow drug companies to develop strategies to modify metabolic clearance and decrease the attrition rates of compounds in development.
The major human CYP450 isoforms involved in drug metabolism are CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. The level of sequence identity between these family members ranges from about 20–80%, with much of the variability within the residues involved in substrate recognition. CYP450 enzymes are also present in bacteria and much of the understanding of substrate recognition is derived from crystal structures obtained of bacterial CYP450 enzymes.
CYP3A is both the most abundant and most clinically significant subfamily of cytochrome P450 enzymes. The CYP3A subfamily has four human isoforms, 3A4, 3A5, 3A7 and 3A43, CYP3A4 being the most commonly associated with drug interactions. The CYP3A isoforms make up approximately 50% of the liver's total cytochrome P450 and are widely expressed throughout the gastrointestinal tract, kidneys and lungs and therefore are ultimately responsible for the majority of first-pass metabolism. This is important as increases or decreases in first-pass metabolism can have the effect of administering a much smaller or larger dose of drug than usual. More than 150 drugs are known substrates of CYP3A4, including many of the opiate analgesics, steroids, antiarrhythmic agents, tricyclic antidepressants, calcium-channel blockers and macrolide antibiotics. Although several substrates show age-dependent reductions in elimination, the enzyme itself does not appear to be altered. CYP3A4 is important in the metabolism of many drugs including cyclosporine, codeine, tamoxifen, lovastatin, and many more, and endogenous compounds such as testosterone, estradiol and cortisol. Ketoconazole, itraconazole, erythromycin, clarithromycin, diltiazem, fluvoxamine, nefazodone, and dihydroxybergamottin and various substances found in grapefruit juice, green tea and other foods are potent inhibitors of CYP3A4 and are known to be responsible for many drug interactions. These interactions can have serious clinical consequences.
Background to Crystallisation
It is well-known in the art of protein chemistry, that crystallising a protein is a chancy and difficult process without any clear expectation of success. It is now evident that protein crystallization is the main hurdle in protein structure determination. For this reason, protein crystallization has become a research subject in and of itself, and is not simply an extension of the protein crystallographer's laboratory. There are many references which describe the difficulties associated with growing protein crystals. For example, Kierzek, A. M. and Zielenkiewicz, P., (2001), Biophysical Chemistry, 91, 1–20, Models of protein crystal growth, and Wiencek, J. M. (1999) Annu. Rev. Biomed. Eng., 1, 505–534, New Strategies for crystal growth. 
It is commonly held that crystallization of protein molecules from solution is the major obstacle in the process of determining protein structures. The reasons for this are many; proteins are complex molecules, and the delicate balance involving specific and non-specific interactions with other protein molecules and small molecules in solution, is difficult to predict.
Each protein crystallizes under a unique set of conditions, which cannot be predicted in advance. Simply supersaturating the protein to bring it out of solution may not work, the result would, in most cases, be an amorphous precipitate. Many precipitating agents are used, common ones are different salts, and polyethylene glycols, but others are known. In addition, additives such as metals and detergents can be added to modulate the behaviour of the protein in solution. Many kits are available (e.g. from Hampton Research), which attempt to cover as many parameters in crystallization space as possible, but in many cases these are just a starting point to optimise crystalline precipitates and crystals which are unsuitable for diffraction analysis. Successful crystallization is aided by a knowledge of the proteins behaviour in terms of solubility, dependence on metal ions for correct folding or activity, interactions with other molecules and any other information that is available. Even so, crystallization of proteins is often regarded as a time-consuming process, whereby subsequent experiments build on observations of past trials.
In cases where protein crystals are obtained, these are not necessarily always suitable for diffraction analysis; they may be limited in resolution, and it may subsequently be difficult to improve them to the point at which they will diffract to the resolution required for analysis. Limited resolution in a crystal can be due to several things. It may be due to intrinsic mobility of the protein within the crystal, which can be difficult to overcome, even with other crystal forms. It may be due to high solvent content within the crystal, which consequently results in weak scattering. Alternatively, it could be due to defects within the crystal lattice which mean that the diffracted x-rays will not be completely in phase from unit to unit within the lattice. Any one of these or a combination of these could mean that the crystals are not suitable for structure determination.
Some proteins never crystallize, and after a reasonable attempt it is necessary to examine the protein itself and consider whether it is possible to make individual domains, different N or C-terminal truncations, or point mutations. It is often hard to predict how a protein could be re-engineered in such a manner as to improve crystallisability. Our understanding of crystallisation mechanisms are still incomplete and the factors of protein structure which are involved in crystallisation are poorly understood.
Determination of Protein Structure
A mathematical operation termed a Fourier transform relates the diffraction pattern observed from a crystal and the molecular structure of the protein comprising the crystal. A Fourier transform may be considered to be a summation of sine and cosine waves each with a defined amplitude and phase. Thus, in theory, it is possible to calculate the electron density associated with a protein structure by carrying out an inverse Fourier transform on the diffraction data. This, however, requires amplitude and phase information to be extracted from the diffraction data. Amplitude information may be obtained by analysing the intensities of the spots within a diffraction pattern. The conventional methods for recording diffraction data do, however, mean that any phase information is lost. This “phase information” must be in some way recovered and the loss of this information represents the “crystallographic phase problem”. The phase information necessary for carrying out the inverse Fourier transform can be obtained via a variety of methods. If a protein structure exists a set of theoretical amplitudes and phases may be calculated using the protein model and then the theoretical phases combined with the experimentally derived amplitudes. An electron density map may then be calculated and the protein structure observed.
If there is no known structure of the protein then alternative methods for obtaining phases must be explored. One method is multiple isomorphous replacement (MIR). This relies on soaking “heavy atom” (i.e. platinum, uranium, mercury, etc) compounds into the crystals and observing how their incorporation into the crystals modifies the spot intensities observed in the diffraction pattern. This method relies on the heavy atoms being incorporated into the protein at a finite number of defined sites. It is a pre-requisite of an isomorphous replacement experiment that the heavy atom soaked crystals remain isomorphous. That is, there should be no appreciable alterations in the physical characteristics of the protein crystal (i.e. perturbations to crystallographic cell dimensions, or significant loss of resolution). Perturbations to the physical properties of the crystal are termed non-isomorphisms and prevent this type of experiment being successfully completed. Successful isomorphous incorporation of heavy atoms into a protein crystal results in the intensities of the spots within the diffraction pattern obtained from the crystal being modified, as compared to the data collected from an identical, unsoaked, (native) crystal. The diffraction data obtained from a successful isomorphous replacement experiment are termed a “derivative” dataset. By mathematically analysing the “native” and “derivative” datasets it is possible to extract preliminary phase information from the datasets. This phase information, when combined with the experimentally obtained amplitudes from the native dataset, enables an electron density map of the unknown protein molecule to be calculated using the Fourier transform method.
An alternative method for obtaining phase information for a protein of unknown structure is to perform a multi-wavelength anomalous dispersion (MAD) experiment. This relies on the absorption of X-rays by electrons at certain characteristic X-ray wavelengths. Different elements have different characteristic absorption edges. Anomalous scattering by atoms within a protein will modify the diffraction pattern obtained from the protein crystal. Thus if a protein contains atoms which are capable of anomalous scattering a diffraction dataset (anomalous dataset) may be collected at an X-ray wavelength at which this anomalous scattering is maximal. By altering the X-ray wavelength to a value at which there is no anomalous scattering a native dataset may then be collected. Similarly to the MIR case, by mathematically processing the anomalous and native datasets the phase information necessary for the calculation of an electron density map may be determined. The most usual way to introduce anomalous scatterers into a protein is to replace the sulphur containing methionine amino acid residues with selenium containing seleno-methionine residues. This is done by generating recombinant protein that is isolated from cells grown on growth media that contain seleno-methionine. Selenium is capable of anomalously scattering X-rays and may thus be used for a MAD experiment. Further methods for phase determination such as single isomorphous replacement (SIR), single isomorphous replacement anomalous scattering (SIRAS) and direct methods exist, but the principles behind them are similar to MIR and MAD.
The final method generally available for the calculation of the phases necessary for the determination of an unknown protein structure is molecular replacement. This method relies upon the assumption that proteins with similar amino acid sequences (primary sequences) will have a similar fold and three-dimensional structure (tertiary structure). Proteins related by amino acid sequence are termed homologous proteins. If an X-ray diffraction dataset has been collected from a crystal whose protein structure is not known, but a structure has been determined for a homologous protein, then molecular replacement can be attempted. Molecular replacement is a mathematical process that attempts to correlate the dataset obtained from a new protein crystal with the theoretical diffraction pattern calculated for a protein of known structure. If the correlation is sufficiently high some phase information can be extracted from the known protein structure and combined with the amplitudes obtained from the new protein dataset. This enables calculation of a preliminary electron density map for the protein of unknown structure.
If an electron density map has been calculated for a protein of unknown structure then the amino acids comprising the protein must be fitted into the electron density for the protein. This is normally done manually, although high resolution data may enable automatic model building. The process of model building and fitting the amino acids to the electron density can be both a time consuming and laborious process. Once the amino acids have been fitted to the electron density it is necessary to refine the structure. Refinement attempts to maximise the correlation between the experimentally calculated electron density and the electron density calculated from the protein model built. Refinement also attempts to optimise the geometry and disposition of the atoms and amino acids within the user-constructed model of the protein structure. Sometimes manual re-building of the structure will be required to release the structure from local energetic minima. There are now several software packages available that enable an experimentalist to carry out refinement of a protein structure. There are certain geometry and correlation diagnostics that are used to monitor the progress of a refinement. These diagnostic parameters are monitored and rebuilding/refinement continued until the experimenter is satisfied that the structure has been adequately refined.
Description of Anomalous Scattering Theory
If the energy of incident X-rays is close to the minimum energy that is required to eject a bound electron from an innermost shell of an atom, the scattering of the X-rays is described as “anomalous”. In the process of “normal” scattering, the electrons are forced to undergo vibrations at the same frequency as that of the incident X-ray photon, emitting elastically scattered photons (i.e. no change in frequency) in the process. However, because this frequency is far from the natural frequency of vibration of the electron there is no effect on the scattered photon from this natural vibration. In the process of “anomalous” scattering, the frequency of the incident photon is close to the natural frequency of the electron, resulting in a resonance effect, which is manifested as a dispersion (decrease in velocity, though still no change in frequency) of the photon, as well as a vibration damping effect, which is manifested as absorption (decrease in intensity) of a fraction of the incident photons.
The anomalously scattered photon will thus have a phase angle associated with it that is retarded when compared with one being scattered normally, all other conditions being equal. If the structure consists of a mixture normal and anomalous scatterers this phase lag results in the breakdown of Friedel's law, as pairs of reflections with indices (h,k,l) and (−h,−k,−l) that are diffracted from opposite sides of the same crystal plane no longer have the same amplitudes.
By careful measurement of the two reflection intensities, and by consideration of their relative amplitudes, it is possible to make an initial estimate of the phases of all reflections that have been observed.
In theory all atoms could give rise to an anomalous scattering effect if irradiated with X-ray radiation of the appropriate wavelength. However as the scattering is directly proportional to the weight of the scatterer, heavier elements are normally chosen, e.g. sulphur or larger. The choice of element is also dependent on the ability to tune the energy of the X-rays to the required transition energy. As access to tuneable synchrotron X-radiation has become routine, the MAD technique has come of age. Incorporation of an anomalous scatterer may be via a number of routes e.g. by soaking crystals in solutions containing heavy atoms which then bind to the protein, by expressing recombinant proteins in media in which an element has been replaced by a suitable heavier element (e.g. the replacement of methionine with selenomethionine) leading to the incorporation of the element in certain amino acids themselves, or making use of naturally occurring co-factors which contain heavy elements.
As the contribution from the anomalous scatterer may be small, it is often important to obtain well-recorded, redundant data, and to facilitate detection of what may be a small signal, it is helpful to have a reference dataset to which the anomalous dataset can be compared. The routine collection of X-ray data at cryo-temperatures has prolonged crystal lifetime and has made collection of multiple datasets (at different wavelengths) from a single crystal now feasible for many crystal systems. Collection and analysis of multiple datasets from a single crystal has the advantage of eliminating all effects related to non-isomorphism (variations in structure between different crystals due to random variations in soaking and/or freezing conditions).
In the case of cytochrome P450, the haem group that forms the site of enzymatic activity naturally contains a single iron atom. Iron has transition energies at the high energies (long wavelengths) obtainable at tunable synchrotron beamlines.
P450 Crystal Stuctures
As of 2002, eight cytochrome P450 structures had been solved by X-ray crystallography and were available in the public domain. All of the cytochrome P450s, whose structures had been solved, were expressed in E. coli. Six structures correspond to bacterial cytochrome P450s: P450cam (CYP101 Poulos et al., 1985, J. Biol. Chem., 260, 16122), the hemeprotein domain of P450BM3 (CYP102, Ravichandran et al., 1993, Science, 261, 731), P450terp (CYP108, Hasemann et al., 1994, J Mol. Biol. 236, 1169), P450eryF (CYP107A1, Cupp-Vickery and Poulos, 1995, Nature Struct. Biol. 2, 144), P450 14α-sterol demethylase (CYP51, Podust et al., 2001, Proc. Natl. Acad. Sci. USA, 98, 3068) and the crystal structure of a thermophilic cytochrome P450 (CYP119) from Archaeon sulfolobus solfataricus was solved (Yano et al., 2000, J. Biol. Chem. 275, 31086). The structure of cytochrome P450nor was obtained from the denitrifying fungus Fusarium oxysporum (Shimizu et al. 2000, J. Inorg. Biochem. 81, 191). The eighth structure is that of the rabbit 2C5 isoform, the first structure of a mammalian cytochrome P450 (Williams et al. 2000, Mol. Cell. 5, 121).
WO 03/035693 describes the crystallisation of a human 2C9 P450 protein molecule and provides an analysis of the protein crystal structure.
The reason why the mammalian cytochrome P450s have been particularly difficult to crystallize, compared to their bacterial counterparts, resides in the nature of these proteins. The bacterial cytochrome P450s are soluble whereas the mammalian P450s are membrane-associated proteins. Thus, structural studies on mammalian cytochrome P450s may use the combination of heterologous expression systems that allow expression of single cytochrome P450s at high concentration with modification of their sequences to improve the solubility and the behaviour of these proteins in solution.
Due to significant sequence differences from both the bacterial proteins and rabbit proteins, to fully understand the role of the human CYP450 enzymes in drug metabolism, the crystal structures of other human isoforms are still required.