This invention relates generally to the field of molecular biology. More particularly, the invention relates to methods and constructs useful for identifying important and/or essential regions of a protein, whether or not the function or activity of the protein is already known.
Current technology enables one to sequence vast amounts of nucleic acids at high speed. However, sequencing alone does not describe the activity of any of the genes sequenced. One can make predictions based on sequence homology that a given gene encodes a protein that exhibits immunoglobulin folds, or may have kinase activity, and the like, but one is limited to identifying features common to known proteins.
If a protein has a known or demonstrable activity, and is not too toxic to express, one can conduct mutagenesis experiments to determine which portion or portions of the protein are responsible for its activity. In general, one prepares a series of mutant versions of the protein in question, typically by a technique such as site-specific mutagenesis, and compares the activity of the mutants with that of the wild type protein. Mutants in which the active portion of the molecule is absent are expected to exhibit little or no activity, while mutants in which an irrelevant part of the molecule is altered are expected to exhibit little difference from the wild type. Due to the number of mutagenesis steps required, one generally selects a few likely spots in the sequence to experiment with, and rarely seeks to alter every residue in turn. Thus, the approach is both time-consuming and incomplete.
We have now invented a method for systematically and quickly examining substantially every position of a protein sequence, and determining whether or not it is essential to the activity of the protein. The method is effective even if the protein has no known activity, and/or is too toxic to express in its active form.
One aspect of the invention is a method for identifying a mutation-sensitive active region of a test protein, by providing a test nucleic acid construct comprising a regulatable promoter polynucleotide and a fusion polynucleotide comprising a test polynucleotide encoding the test protein fused to a reporter polynucleotide encoding a detectable label, wherein said fusion polynucleotide is operably associated with the promoter polynucleotide, wherein expression of the fusion polynucleotide in a selected host cell results in a specific phenotype and the presence of the detectable label; mutagenizing the test nucleic acid construct to provide a mutagenized construct; transforming a selected host cell with the mutagenized construct to provide a transformed host cell; selecting a transformed host cell that exhibits the detectable label, but which does not exhibit the specific phenotype; and sequencing a portion of the mutagenized construct from the selected transformed host cell to determine the alteration of the polynucleotide(s).
Another aspect of the invention is a population of host cells, comprising a plurality of host cells, each host cell having a test nucleic acid construct which comprises a regulatable promoter polynucleotide and a fusion polynucleotide comprising a mutagenized test polynucleotide encoding a mutagenized test protein fused to a reporter gene encoding a detectable label, wherein the fusion polynucleotide is operably associated with the promoter polynucleotide, and expression of the fusion polynucleotide in the host cell results in expression of said detectable label, wherein the plurality of host cells comprises a plurality of different mutagenized test polynucleotides.
The term xe2x80x9creporter genexe2x80x9d refers to a polynucleotide that encodes a molecule that can be detected readily, either directly or by its effect on host cell characteristics. Exemplary reporter genes encode enzymes, for example xcex2-galactosidase and URA3, luminescent or fluorescent proteins, such as Green Fluorescent Protein (GFP) and variants thereof, antigenic epitopes (for example Histidine-tag or influenza hemagluttinin tag), mRNA of distinct sequences, and the like. The term xe2x80x9cdetectable labelxe2x80x9d refers to a reporter gene or protein that can be detected directly by visual, optical, or spectroscopic methods, such as, for example, GFP, GFP variants, pigments, chromogenic enzymes such as horseradish peroxidase and xcex2-galactosidase, and the like. The terms xe2x80x9cselectable labelxe2x80x9d and xe2x80x9cselectable markerxe2x80x9d refers to an enzyme reporter gene or protein that facilitates separation of cells that express the label from cells that do not express the label, or to separate cells that express the label to different degrees. Such separation can be by any convenient means, such as, for example, survival of one group or the other, dependence upon a selected nutrient or lack thereof, sensitivity to a given compound, adherence to a solid surface, and the like.
The term xe2x80x9cregulatable promoterxe2x80x9d refers to a portion of a polynucleotide that is capable of controlling the transcription of nearby DNA, and that responds to the presence or activity of one or more proteins by increasing or decreasing transcription of the affected DNA. A variety of suitable promoters are known, for example GAL, TET, hybrid promoters, and the like.
The term xe2x80x9cspecific phenotypexe2x80x9d as used herein refers to an alteration in one or more characteristics of the host cell distinct from the label, as a result of the heterologous gene or protein presence, for example, death, survival (in the presence of normally lethal conditions or agents), adherence or lack of adherence, morphology, color and appearance, and the like. The specific phenotype excludes any characteristic conferred by the label, which is independent of the specific phenotype: the specific phenotype is preferably observable regardless of the presence or absence of the detectable label as a fusion partner.
The term xe2x80x9cmutagenizingxe2x80x9d refers to a process for altering the nucleotide sequence of a polynucleotide, for example using PCR, radiation, chemical agents, enzymes, and the like.
The term xe2x80x9cfluorescent proteinxe2x80x9d refers to a protein capable of fluorescing when illuminated. Exemplary fluorescent proteins include, without limitation, the Aequorea victoria xe2x80x9cGreen Fluorescent Proteinxe2x80x9d (xe2x80x9cGFPxe2x80x9d: see for example D. C. Prasher et al., Gene (1992) 111:229-33; M. Chalfie et al., Science (1994) 263:802-05, both incorporated herein by reference), and fluorescent mutants thereof (xe2x80x9cGFP variantsxe2x80x9d: see for example U.S. Pat. No. 5,625,048 and U.S. 5,777,079, both incorporated herein by reference).
The term xe2x80x9cdifferent host cellsxe2x80x9d refers to a group of host cells that differ genetically from each other. The host cells can be derived from different species (for example, different species of yeast, or different species of mammals), different strains (for example, yeast strains that differ from each other in their genotype but are otherwise derived from the same species, or yeast strains derived by mutagenizing one or more parent strains), different tissue types (for example, human liver cells, fibroblasts, kidney cells, lung cells, tumor cells of various types, and the like), different stages of differentiation, and the like.
Methods of the invention permit one to quickly identify regions of a protein, for example an enzyme, that are sensitive to mutation. Loss of activity following mutation of one or a few base pairs in a gene suggests that the codon affected encodes an amino acid critical for activity of the encoded protein. This loss of activity may result, for example, from mutation of an active site residue in an enzyme, or from distortion or blocking of a binding site. The resulting information suggests that the affected amino acid can be useful as the target of further drug discovery investigation.
In the practice of the subject method, a host cell is selected for the test nucleic acid such that expression of the test nucleic acid results in a heterologous protein that confers an observable phenotype in the host cell that is due to the heterologous protein activity. For example, expression of the test nucleic acid can be toxic, inhibit host cell growth, alter cell adhesion to a solid support, render the cell reliant on or free from reliance on particular nutrients in its culture media, and the like. The host cell can be any suitable eukaryotic cell, for example yeast, mammalian cells, insect cells, and the like, and can comprise a plurality of cells having different genotypes. For example, one can transform a population of different host cells, for example yeast strains that differ by each having a different gene, signal or metabolic pathway deleted or disabled. The test nucleic acid can be expressed under the control of a regulatable promoter, permitting one to grow the host cell to sufficient density (i.e. by first growing the cells with the regulated promoter turned xe2x80x9coffxe2x80x9d). If the selected host cell(s) does not display an observable pheno-type in reaction to the test nucleic acid expression, one can select a different host cell, or alter (xe2x80x9csensitizexe2x80x9d or potentiate) the selected host cell to render it more sensitive. The host cell can be sensitized by disabling metabolic or signal pathways, or otherwise altering its homeostasis until the cell is rendered dependent upon a pathway that is affected by the heterologous protein. This can be accomplished by standard mutagenesis techniques, generating a mutagenized population of cells and selecting for cells that meet the desired criteria.
The test nucleic acid is then transferred to a vector (such as a plasmid) and placed under the control of a regulatable promoter, and fused with a reporter gene. The reporter gene is preferably positioned downstream of the test nucleic acid, such that reporter gene transcription occurs only after test nucleic acid transcription. The reporter gene is fused to the test nucleic acid in frame, preferably without an intervening stop codon, and is selected so that the resulting heterologous polypeptide/reporter gene product fusion protein still exhibits the biological activity of the heterologous polypeptide and the reporter alone. A presently preferred reporter protein is Green Fluorescent Protein (GFP), and its several variations (collectively xe2x80x9cGFPsxe2x80x9d: see for example, U.S. Pat. Nos. 5,998,204; U.S. 5,998,136; U.S. 5,994,077; U.S. 5,993,778; U.S. 5,985,577; U.S. 5,981,200; and U.S. 5,968,750, all incorporated herein by reference in full). For the rare case in which GFPs interfere with the heterologous protein activity, one can substitute another indicator, such as an epitope tag (an oligopeptide capable of recognition by a specific antibody, typically a unique monoclonal antibody developed specifically to bind to the selected epitope).
The vector is then recovered from the host cell and mutagenized, preferably in an alternate host (for example, E. coli), or in vitro. It is possible to mutagenize the vector while in the original host, but this is not preferred due to the introduction of background noise (mutations in other parts of the host genome). One can employ any desired method of mutagenesis: it is presently preferred to randomly mutagenize the vector, for example by chemical and/or radiation means. One can also employ enzymatic methods, for example using xe2x80x9clow fidelityxe2x80x9d replicases or mutagenizing PCR. Additionally, one can employ combinations of methods, or two or more methods in succession, to obtain the desired degree of mutagenesis. The goal is to attain a level of mutagenesis such that most of the vectors in a population contain one or two point mutations in the target nucleic acid.
The mutagenized vectors are transformed into selected host cells, and the promoters induced to provide expression of the heterologous polypeptide/reporter fusion protein. The transformants are cultured, and are screened for colonies which lack the observable phenotype conferred by the heterologous protein (for example, survival) and exhibit the indicator. For example, where the observable phenotype is death, colonies that exhibit the reporter and survive promoter induction under conditions lethal to control host cells bearing the non-mutagenized vector, must bear a vector having a point mutation in the test nucleic acid that results in a heterologous protein lacking the lethal activity. The vectors are recovered from a plurality of surviving colonies, and the regions of the test nucleic acid that were mutagenized are determined, for example by sequencing. The positions of point mutations indicate which regions of the sequence encode critical residues in the heterologous protein. If a sufficiently large number of vectors are mutagenized, essentially all critical sites (or sites that are sensitive to point mutations) will be indicated by sequence alterations in one or more isolates. Point mutations in regions that do not encode critical residues result in active heterologous protein, and are selected against. Thus, a histogram of the number of mutations for each amino acid residue in the heterologous protein will show one or more mutations at positions where mutation of the residue substantially alters activity, and will show few if any mutations at positions that are not sensitive to mutation. An experiment of sufficient size (sufficiently large number of mutants) will unequivocally indicate the xe2x80x9ccriticalxe2x80x9d portions of a protein, including its active sites and/or binding sites, thus pointing out relevant targets for the design of pharmaceutical agents.
A slightly altered method of the above involves first mutagenizing the test nucleic acid (by mutagenic PCR for example) and placing it into the promoter/reporter vector (by recombination for example) and into the recipient cells in a single step. This alternative method allows one to enhance the targeting of the mutations to the test nucleic acid, because only the test nucleic acid is exposed to the mutagenic conditions.