This invention relates to protein selection methods.
The invention was made with government support under grant F32 GM17776-01 and F32 GM17776-02. The government has certain rights in the invention.
Methods currently exist for the isolation of RNA and DNA molecules based on their functions. For example, experiments of Ellington and Szostak (Nature 346:818 (1990); and Nature 355:850 (1992)) and Tuerk and Gold (Science 249:505 (1990); and J. Mol. Biol 222:739 (1991)) have demonstrated that very rare (i.e., less than 1 in 1013) nucleic acid molecules with desired properties may be isolated out of complex pools of molecules by repeated rounds of selection and amplification. These methods offer advantages over traditional genetic selections in that (i) very large candidate pools may be screened ( greater than 1015), (ii) host viability and in vivo conditions are not concerns, and (iii) selections may be carried out even if an in vivo genetic screen does not exist. The power of in vitro selection has been demonstrated in defining novel RNA and DNA sequences with very specific protein binding functions (see, for example, Tuerk and Gold, Science 249:505 (1990); Irvine et al., J. Mol. Biol 222:739 (1991); Oliphant et al., Mol. Cell Biol. 9:2944 (1989); Blackwell et al., Science 250:1104 (1990); Pollock and Treisman, Nuc. Acids Res. 18:6197 (1990); Thiesen and Bach, Nuc. Acids Res. 18:3203 (1990); Bartel et al., Cell 57:529 (1991); Stormo and Yoshioka, Proc. Natl. Acad. Sci. USA 88:5699 (1991); and Bock et al., Nature 355:564 (1992)), small molecule binding functions (Ellington and Szostak, Nature 346:818 (1990); Ellington and Szostak, Nature 355:850 (1992)), and catalytic functions (Green et al., Nature 347:406 (1990); Robertson and Joyce, Nature 344:467 (1990); Beaudry and Joyce, Science 257:635 (1992); Bartel and Szostak, Science 261:1411 (1993); Lorsch and Szostak, Nature 371:31-36 (1994); Cuenoud and Szostak, Nature 375:611-614 (1995); Chapman and Szostak, Chemistry and Biology 2:325-333 (1995); and Lohse and Szostak, Nature 381:442-444 (1996)). A similar scheme for the selection and amplification of proteins has not been demonstrated.
The purpose of the present invention is to allow the principles of in vitro selection and in vitro evolution to be applied to proteins. The invention facilitates the isolation of proteins with desired properties from large pools of partially or completely random amino acid sequences. In addition, the invention solves the problem of recovering and amplifying the protein sequence information by covalently attaching the mRNA coding sequence to the protein molecule.
In general, the inventive method consists of an in vitro or in situ transcription/ translation protocol that generates protein covalently linked to the 3xe2x80x2 end of its own mRNA, i.e., an RNA-protein fusion. This is accomplished by synthesis and in vitro or in situ translation of an mRNA molecule with a peptide acceptor attached to its 3xe2x80x2 end. One preferred peptide acceptor is puromycin, a nucleoside analog that adds to the C-terminus of a growing peptide chain and terminates translation. In one preferred design, a DNA sequence is included between the end of the message and the peptide acceptor which is designed to cause the ribosome to pause at the end of the open reading frame, providing additional time for the peptide acceptor (for example, puromycin) to accept the nascent peptide chain before hydrolysis of the peptidyl-tRNA linkage.
If desired, the resulting RNA-protein fusion allows repeated rounds of selection and amplification because the protein sequence information may be recovered by reverse transcription and amplification (for example, by PCR amplification as well as any other amplification technique, including RNA-based amplification techniques such as 3SR or TSA). The amplified nucleic acid may then be transcribed, modified, and in vitro or in situ translated to generate mRNA-protein fusions for the next round of selection. The ability to carry out multiple rounds of selection and amplification enables the enrichment and isolation of very rare molecules, e.g., one desired molecule out of a pool of 1015 members. This in turn allows the isolation of new or improved proteins which specifically recognize virtually any target or which catalyze desired chemical reactions.
Accordingly, in a first aspect, the invention features a method for selection of a desired protein, involving the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate protein coding sequence and each of which is operably linked to a peptide acceptor at the 3xe2x80x2 end of the candidate protein coding sequence; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of candidate RNA-protein fusions; and (c) selecting a desired RNA-protein fusion, thereby selecting the desired protein.
In a related aspect, the invention features a method for selection of a DNA molecule which encodes a desired protein, involving the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate protein coding sequence and each of which is operably linked to a peptide acceptor at the 3xe2x80x2 end of the candidate protein coding sequence; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of candidate RNA-protein fusions; (c) selecting a desired RNA-protein fusion; and (d) generating from the RNA portion of the fusion a DNA molecule which encodes the desired protein.
In another related aspect, the invention features a method for selection of a protein having an altered function relative to a reference protein, involving the steps of: (a) producing a population of candidate RNA molecules from a population of DNA templates, the candidate DNA templates each having a candidate protein coding sequence which differs from the reference protein coding sequence, the RNA molecules each comprising a translation initiation sequence and a start codon operably linked to the candidate protein coding sequence and each being operably linked to a peptide acceptor at the 3xe2x80x2 end; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of candidate RNA-protein fusions; and (c) selecting an RNA-protein fusion having an altered function, thereby selecting the protein having the altered function.
In yet another related aspect, the invention features a method for selection of a DNA molecule which encodes a protein having an altered function relative to a reference protein, involving the steps of: (a) producing a population of candidate RNA molecules from a population of candidate DNA templates, the candidate DNA templates each having a candidate protein coding sequence which differs from the reference protein coding sequence, the RNA molecules each comprising a translation initiation sequence and a start codon operably linked to the candidate protein coding sequence and each being operably linked to a peptide acceptor at the 3xe2x80x2 end; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of RNA-protein fusions; (c) selecting an RNA-protein fusion having an altered function; and (d) generating from the RNA portion of the fusion a DNA molecule which encodes the protein having the altered function.
In yet another related aspect, the invention features a method for selection of a desired RNA, involving the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate protein coding sequence and each of which is operably linked to a peptide acceptor at the 3xe2x80x2 end of the candidate protein coding sequence; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of candidate RNA-protein fusions; and (c) selecting a desired RNA-protein fusion, thereby selecting the desired RNA.
In preferred embodiments of the above methods, the peptide acceptor is puromycin; each of the candidate RNA molecules further includes a pause sequence or further includes a DNA or DNA analog sequence covalently bonded to the 3xe2x80x2 end of the RNA; the population of candidate RNA molecules includes at least 109, preferably, at least 1010, more preferably, at least 1011, 1012, or 1013, and, most preferably, at least 1014 different RNA molecules; the in vitro translation reaction is carried out in a lysate prepared from a eukaryotic cell or portion thereof (and is, for example, carried out in a reticulocyte lysate or wheat germ lysate); the in vitro translation reaction is carried out in an extract prepared from a prokaryotic cell (for example, E. coli) or portion thereof; the selection step involves binding of the desired protein to an immobilized binding partner;
the selection step involves assaying for a functional activity of the desired protein; the DNA molecule is amplified; the method further involves repeating the steps of the above selection methods; the method further involves transcribing an RNA molecule from the DNA molecule and repeating steps (a) through (d); following the in vitro translating step, the method further involves an incubation step carried out in the presence of 50-100 mM Mg2+; and the RNA-protein fusion further includes a nucleic acid or nucleic acid analog sequence positioned proximal to the peptide acceptor which increases flexibility.
In other related aspects, the invention features an RNA-protein fusion selected by any of the methods of the invention; a ribonucleic acid covalently bonded though an amide bond to an amino acid sequence, the amino acid sequence being encoded by the ribonucleic acid; and a ribonucleic acid which includes a translation initiation sequence and a start codon operably linked to a candidate protein coding sequence, the ribonucleic acid being operably linked to a peptide acceptor (for example, puromycin) at the 3xe2x80x2 end of the candidate protein coding sequence.
In a second aspect, the invention features a method for selection of a desired protein or desired RNA through enrichment of a sequence pool. This method involves the steps of: (a) providing a population of candidate RNA molecules, each of which includes a translation initiation sequence and a start codon operably linked to a candidate protein coding sequence and each of which is operably linked to a peptide acceptor at the 3xe2x80x2 end of the candidate protein coding sequence; (b) in vitro or in situ translating the candidate protein coding sequences to produce a population of candidate RNA-protein fusions; (c) contacting the population of RNA-protein fusions with a binding partner specific for either the RNA portion or the protein portion of the RNA-protein fusion under conditions which substantially separate the binding partner-RNA-protein fusion complexes from unbound members of the population; (d) releasing the bound RNA-protein fusions from the complexes; and (e) contacting the population of RNA-protein fusions from step (d) with a binding partner specific for the protein portion of the desired RNA-protein fusion under conditions which substantially separate the binding partner-RNA-protein fusion complex from unbound members of said population, thereby selecting the desired protein and the desired RNA.
In preferred embodiments, the method further involves repeating steps (a) through (e). In addition, for these repeated steps, the same or different binding partners may be used, in any order, for selective enrichment of the desired RNA-protein fusion. In another preferred embodiment, step (d) involves the use of a binding partner (for example, a monoclonal antibody) specific for the protein portion of the desired fusion. This step is preferably carried out following reverse transcription of the RNA portion of the fusion to generate a DNA which encodes the desired protein. If desired, this DNA may be isolated and/or PCR amplified. This enrichment technique may be used to select a desired protein or may be used to select a protein having an altered function relative to a reference protein.
In other preferred embodiments of the enrichment methods, the peptide acceptor is puromycin; each of the candidate RNA molecules further includes a pause sequence or further includes a DNA or DNA analog sequence covalently bonded to the 3xe2x80x2 end of the RNA; the population of candidate RNA molecules includes at least 109, preferably, at least 1010, more preferably, at least 1011, 1012, or 1013, and, most preferably, at least 1014 different RNA molecules; the in vitro translation reaction is carried out in a lysate prepared from a eukaryotic cell or portion thereof (and is, for example, carried out in a reticulocyte lysate or wheat germ lysate); the in vitro translation reaction is carried out in an extract prepared from a prokaryotic cell or portion thereof (for example, E. coli); the DNA molecule is amplified; at least one of the binding partners is immobilized on a solid support; following the in vitro translating step, the method further involves an incubation step carried out in the presence of 50-100 mM Mg2+; and the RNA-protein fusion further includes a nucleic acid or nucleic acid analog sequence positioned proximal to the peptide acceptor which increases flexibility.
In a related aspect, the invention features kits for carrying out any of the selection methods described herein.
In a third and final aspect, the invention features a microchip that includes an array of immobilized single-stranded nucleic acids, the nucleic acids being hybridized to RNA-protein fusions. Preferably, the protein component of the RNA-protein fusion is encoded by the RNA.
As used herein, by a xe2x80x9cpopulationxe2x80x9d is meant more than one molecule (for example, more than one RNA, DNA, or RNA-protein fusion molecule). Because the methods of the invention facilitate selections which begin, if desired, with large numbers of candidate molecules, a xe2x80x9cpopulationxe2x80x9d according to the invention preferably means more than 109 molecules, more preferably, more than 1011, 1012, or 1013 molecules, and, most preferably, more than 1013 molecules.
By xe2x80x9cselectingxe2x80x9d is meant substantially partitioning a molecule from other molecules in a population. As used herein, a xe2x80x9cselectingxe2x80x9d step provides at least a 2-fold, preferably, a 30-fold, more preferably, a 100-fold, and, most preferably, a 1000-fold enrichment of a desired molecule relative to undesired molecules in a population following the selection step. As indicated herein, a selection step may be repeated any number of times, and different types of selection steps may be combined in a given approach.
By a xe2x80x9cproteinxe2x80x9d is meant any two or more naturally occurring or modified amino acids joined by one or more peptide bonds. xe2x80x9cProteinxe2x80x9d and xe2x80x9cpeptidexe2x80x9d are used interchangeably herein.
By xe2x80x9cRNAxe2x80x9d is meant a sequence of two or more covalently bonded, naturally occurring or modified ribonucleotides. One example of a modified RNA included within this term is phosphorothioate RNA.
By a xe2x80x9ctranslation initiation sequencexe2x80x9d is meant any sequence which is capable of providing a functional ribosome entry site. In bacterial systems, this region is sometimes referred to as a Shine-Dalgamo sequence.
By a xe2x80x9cstart codonxe2x80x9d is meant three bases which signal the beginning of a protein coding sequence. Generally, these bases are AUG (or ATG); however, any other base triplet capable of being utilized in this manner may be substituted.
By xe2x80x9ccovalently bondedxe2x80x9d to a peptide acceptor is meant that the peptide acceptor is joined to a xe2x80x9cprotein coding sequencexe2x80x9d either directly through a covalent bond or indirectly through another covalently bonded sequence (for example, DNA corresponding to a pause site).
By a xe2x80x9cpeptide acceptorxe2x80x9d is meant any molecule capable of being added to the C-terminus of a growing protein chain by the catalytic activity of the ribosomal peptidyl transferase function. Typically, such molecules contain (i) a nucleotide or nucleotide-like moiety (for example, adenosine or an adenosine analog (di-methylation at the N-6 amino position is acceptable)), (ii) an amino acid or amino acid-like moiety (for example, any of the 20 D- or L-amino acids or any amino acid analog thereof (for example, O-methyl tyrosine or any of the analogs described by Elhman et al., Meth. Enzymol. 202:301, 1991), and (iii) a linkage between the two (for example, an ester, amide, or ketone linkage at the 3xe2x80x2 position or, less preferably, the 2xe2x80x2 position); preferably, this linkage does not significantly perturb the pucker of the ring from the natural ribonucleotide conformation. Peptide acceptors may also possess a nucleophile, which may be, without limitation, an amino group, a hydroxyl group, or a sulfhydryl group. In addition, peptide acceptors may be composed of nucleotide mimetics, amino acid mimetics, or mimetics of the combined nucleotide-amino acid structure.
By a peptide acceptor being positioned xe2x80x9cat the 3xe2x80x2 endxe2x80x9d of a protein coding sequence is meant that the peptide acceptor molecule is positioned after the final codon of that protein coding sequence. This term includes, without limitation, a peptide acceptor molecule that is positioned precisely at the 3xe2x80x2 end of the protein coding sequence as well as one which is separated from the final codon by intervening coding or non-coding sequence (for example, a sequence corresponding to a pause site). This term also includes constructs in which coding or non-coding sequences follow (that is, are 3xe2x80x2 to) the peptide acceptor molecule. In addition, this term encompasses, without limitation, a peptide acceptor molecule that is covalently bonded (either directly or indirectly through intervening nucleic acid sequence) to the protein coding sequence, as well as one that is joined to the protein coding sequence by some non-covalent means, for example, through hybridization using a second nucleic acid sequence that binds at or near the 3xe2x80x2 end of the protein coding sequence and that itself is bound to a peptide acceptor molecule.
By an xe2x80x9caltered functionxe2x80x9d is meant any qualitative or quantitative change in the function of a molecule.
By a xe2x80x9cpause sequencexe2x80x9d is meant a nucleic acid sequence which causes a ribosome to slow or stop its rate of translation.
By xe2x80x9cbinding partner,xe2x80x9d as used herein, is meant any molecule which has a specific, covalent or non-covalent affinity for a portion of a desired RNA-protein fusion. Examples of binding partners include, without limitation, members of antigen/antibody pairs, protein/inhibitor pairs, receptor/ligand pairs (for example cell surface receptor/ligand pairs, such as hormone receptor/peptide hormone pairs), enzyme/substrate pairs (for example, kinase/substrate pairs), lectin/carbohydrate pairs, oligomeric or heterooligomeric protein aggregates, DNA binding protein/DNA binding site pairs, RNA/protein pairs, and nucleic acid duplexes, heteroduplexes, or ligated strands, as well as any molecule which is capable of forming one or more covalent or non-covalent bonds (for example, disulfide bonds) with any portion of an RNA-protein fusion. Binding partners include, without limitation, any of the xe2x80x9cselection motifsxe2x80x9d presented in FIG. 2.
By a xe2x80x9csolid supportxe2x80x9d is meant, without limitation, any column (or column material), bead, test tube, microtiter dish, solid particle (for example, agarose or sepharose), microchip (for example, silicon, silicon-glass, or gold chip), or membrane (for example, the membrane of a liposome or vesicle) to which an affinity complex may be bound, either directly or indirectly (for example, through other binding partner intermediates such as other antibodies or Protein A), or in which an affinity complex may be embedded (for example, through a receptor or channel).
The presently claimed invention provides a number of significant advantages. To begin with, it is the first example of this type of scheme for the selection and amplification of proteins. This technique overcomes the impasse created by the need to recover nucleotide sequences corresponding to desired, isolated proteins (since only nucleic acids can be replicated). In particular, many prior methods that allowed the isolation of proteins from partially or fully randomized pools did so through an in vivo step. Methods of this sort include monoclonal antibody technology (Milstein, Sci. Amer. 243:66 (1980); and Schultz et al., J. Chem. Engng. News 68:26 (1990)), phage display (Smith, Science 228:1315 (1985); Parmley and Smith, Gene 73:305 (1988); and McCafferty et al., Nature 348:552 (1990)), peptide-lac repressor fusions (Cull et al., Proc. Natl. Acad. Sci. USA 89:1865 (1992)), and classical genetic selections. Unlike the present technique, each of these methods relies on a topological link between the protein and the nucleic acid so that the information of the protein is retained and can be recovered in readable, nucleic acid form.
In addition, the present invention provides advantages over the stalled translation method (Tuerk and Gold, Science 249:505 (1990); Irvine et al., J. Mol. Biol 222:739 (1991); Korman et al., Proc. Natl. Acad. Sci. USA 79:1844-1848 (1982); Mattheakis et al., Proc. Natl. Acad. Sci. USA 91:9022-9026 (1994); Mattheakis et al., Meth. Enzymol. 267:195 (1996); and Hanes and Pluckthun, Proc. Natl. Acad. Sci. USA 94:4937 (1997)), a technique in which selection is for some property of a nascent protein chain that is still complexed with the ribosome and its mRNA. Unlike the stalled translation technique, the present method does not rely on maintaining the integrity of an mRNA: ribosome: nascent chain ternary complex, a complex that is very fragile and is therefore limiting with respect to the types of selections which are technically feasible.
The present method also provides advantages over the branched synthesis approach proposed by Brenner and Lerner (Proc. Natl. Acad. Sci. USA 89:5381-5383 (1992)), in which DNA-peptide fusions are generated, and genetic information is theoretically recovered following one round of selection. Unlike the branched synthesis approach, the present method does not require the regeneration of a peptide from the DNA portion of a fusion (which, in the branched synthesis approach, is generally accomplished by individual rounds of chemical synthesis). Accordingly, the present method allows for repeated rounds of selection using populations of candidate molecules. In addition, unlike the branched synthesis technique, which is generally limited to the selection of fairly short sequences, the present method is applicable to the selection of protein molecules of considerable length.
In yet another advantage, the present selection and directed evolution technique can make use of very large and complex libraries of candidate sequences. In contrast, existing protein selection methods which rely on an in vivo step are typically limited to relatively small libraries of somewhat limited complexity. This advantage is particularly important when selecting functional protein sequences considering, for example, that 1013 possible sequences exist for a peptide of only 10 amino acids in length. In classical genetic techniques, lac repressor fusion approaches, and phage display methods, maximum complexities generally fall orders of magnitude below 1013 members. Large library size also provides an advantage for directed evolution applications, in that sequence space can be explored to a greater depth around any given starting sequence.
The present technique also differs from prior approaches in that the selection step is context-independent. In many other selection schemes, the context in which, for example, an expressed protein is present can profoundly influence the nature of the library generated. For example, an expressed protein may not be properly expressed in a particular system or may not be properly displayed (for example, on the surface of a phage particle). Alternatively, the expression of a protein may actually interfere with one or more critical steps in a selection cycle, e.g., phage viability or infectivity, or lac repressor binding. These problems can result in the loss of functional molecules or in limitations on the nature of the selection procedures that may be applied.
Finally, the present method is advantageous because it provides control over the repertoire of proteins that may be tested. In certain techniques (for example, antibody selection), there exists little or no control over the nature of the starting pool. In yet other techniques (for example, lac fusions and phage display), the candidate pool must be expressed in the context of a fusion protein. In contrast, RNA-protein fusion constructs provide control over the nature of the candidate pools available for screening. In addition, the candidate pool size has the potential to be as high as RNA or DNA pools (xcx9c1015 members), limited only by the size of the in vitro translation reaction performed. And the makeup of the candidate pool depends completely on experimental design; random regions may be screened in isolation or within the context of a desired fusion protein, and most if not all possible sequences may be expressed in candidate pools of RNA-protein fusions.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
The drawings will first briefly be described.