In general, the invention features DNA-protein fusions and their uses, particularly for the selection of desired proteins and their corresponding nucleic acid sequences.
Recently, a combinatorial method was developed for the isolation of proteins with desired properties from large pools of proteins (Szostak et al., U.S. Ser. No. 09/007,005; Szostak et al., WO98/31700; Roberts and Szostak, Proc. Natl. Acad. Sci. USA (1997) vol. 94, p. 12297-12302). By this method, the protein portion is linked to its encoding RNA by a covalent chemical bond. Due to the covalent nature of this linkage, selection experiments are not limited to the extremely mild reaction conditions that must be used for approaches that involve non-covalent complex formation such as ribosome display (Hanes and Plxc3xcckthun, Proc. Natl. Acad. Sci. USA (1997) vol. 94, p. 4937-4942; He and Taussig, Nucl. Acids Res. (1997) vol. 25, p 5132-5143). However, precautions do need to be taken during the selection process to minimize RNA degradation, since the accidental cleavage of ribo-bonds can result in the irreversible loss of encoded information. For this reason, these selection procedures are typically carried out using reaction media and equipment that are free of ribonucleases or other deleterious contaminants.
The present invention provides methods for covalently tagging proteins with their encoding DNA sequences. These DNA-protein fusions, which may be used in molecular evolution and recognition techniques, are chemically more stable than RNA-protein fusions and therefore provide a number of advantages (as discussed in more detail below).
Accordingly, in general, the invention features methods for generating DNA-protein fusions. A first method involves: (a) linking a nucleic acid primer to an RNA molecule (preferably, at or near the RNA 3xe2x80x2 end) , the primer being bound to a peptide acceptor (for example, puromycin); (b) translating the RNA to produce a protein product, the protein product being covalently bound to the primer; and (c) reverse transcribing the RNA to produce a DNA-protein fusion.
A second method involves: (a) generating an RNA-protein fusion; (b) hybridizing a nucleic acid primer to the fusion (preferably, at or near the RNA 3xe2x80x2 end); and (c) reverse transcribing the RNA to produce a DNA-protein fusion.
In a preferred embodiment of the above methods, the method may further involve treating the product of step (c) to remove the RNA (for example, by contacting the product of step (c) with RNase H under conditions sufficient to digest the RNA). In additional preferred embodiments, the nucleic acid primer is a DNA primer; the translating step is carried out in vitro; and the nucleic acid primer has a hairpin structure. In addition, the primer may further include a photocrosslinking agent, such as psoralen, and the primer may be crosslinked to an oligonucleotide which is bound to a peptide acceptor or, alternatively, may be hybridized to the RNA molecule, followed by a linking step that is carried out by photocrosslinking.
In related aspects, the invention also features a molecule including a DNA covalently bonded to a protein (preferably, of at least 10 amino acids) through a peptide acceptor (for example, puromycin), as well as a molecule including a DNA covalently bonded to a protein, in which the protein includes at least 10 amino acids.
In preferred embodiments of both of these aspects, the protein includes at least 30 amino acids, more preferably, at least 100 amino acids, and may even include at least 200 or 250 amino acids. In other preferred embodiments, the protein is encoded by the DNA and is preferably entirely encoded by the DNA; the molecule further includes a ribonucleic acid covalently bonded to the DNA; the protein is encoded by the ribonucleic acid; and the DNA is double stranded.
In another related aspect, the invention features a population of at least 105, and preferably, at least 1014, DNA-protein fusions of the invention, each fusion including a DNA covalently bonded to a protein.
In addition, the invention features selection methods which utilize the DNA-protein fusions described herein. A first selection method involves the steps of: (a) providing a population of DNA-protein fusions, each including a DNA covalently bonded to a candidate protein; and (b) selecting a desired DNA-protein fusion, thereby selecting the desired protein or DNA.
A second selection method involves the steps of: (a) producing a population of candidate DNA-protein fusions, each including a DNA covalently bonded to a candidate protein and having a candidate protein coding sequence which differs from a reference protein coding sequence; and (b) selecting a DNA-protein fusion having an altered function, thereby selecting the protein having the altered function or its encoding DNA.
In preferred embodiments, the selection step involves either binding of the desired protein to an immobilized binding partner or assaying for a functional activity of the desired protein. In addition, the method may further involve repeating steps (a) and (b).
In a final aspect, the invention features a solid support including an array of immobilized molecules, each including a covalently-bonded DNA-protein fusion of the invention. In a preferred embodiment, the solid support is a microchip.
As used herein, by a xe2x80x9cpopulationxe2x80x9d is meant 105 or more molecules (for example, DNA-protein fusion molecules). Because the methods of the invention facilitate selections which begin, if desired, with large numbers of candidate molecules, a xe2x80x9cpopulationxe2x80x9d according to the invention preferably means more than 107 molecules, more preferably, more than 109, 1013, or 1014 molecules, and, most preferably, more than 1015 molecules.
By xe2x80x9cselectingxe2x80x9d is meant substantially partitioning a molecule from other molecules in a population. As used herein, a xe2x80x9cselectingxe2x80x9d step provides at least a 2-fold, preferably, a 30-fold, more preferably, a 100-fold, and, most preferably, a 1000-fold enrichment of a desired molecule relative to undesired molecules in a population following the selection step. A selection step may be repeated any number of times, and different types of selection steps may be combined in a given approach.
By a xe2x80x9cproteinxe2x80x9d is meant any two or more naturally occurring or modified amino acids joined by one or more peptide bonds. xe2x80x9cProteinxe2x80x9d and xe2x80x9cpeptidexe2x80x9d are used interchangeably herein.
By xe2x80x9cRNAxe2x80x9d is meant a sequence of two or more covalently bonded, naturally occurring or modified ribonucleotides. One example of a modified RNA included within this term is phosphorothioate RNA.
By xe2x80x9cDNAxe2x80x9d is meant a sequence of two or more covalently bonded, naturally occurring or modified deoxyribonucleotides.
By a xe2x80x9cnucleic acidxe2x80x9d is meant any two or more covalently bonded nucleotides or nucleotide analogs or derivatives. As used herein, this term includes, without limitation, DNA, RNA, and PNA.
By a xe2x80x9cpeptide acceptorxe2x80x9d is meant any molecule capable of being added to the C-terminus of a growing protein chain by the catalytic activity of the ribosomal peptidyl transferase function. Typically, such molecules contain (i) a nucleotide or nucleotide-like moiety (for example, adenosine or an adenosine analog (di-methylation at the N-6 amino position is acceptable)), (ii) an amino acid or amino acid-like moiety (for example, any of the 20 D- or L-amino acids or any amino acid analog thereof (for example, O-methyl tyrosine or any of the analogs described by Ellman et al., Meth. Enzymol. 202:301, 1991), and (iii) a linkage between the two (for example, an ester, amide, or ketone linkage at the 3xe2x80x2 position or, less preferably, the 2xe2x80x2 position); preferably, this linkage does not significantly perturb the pucker of the ring from the natural ribonucleotide conformation. Peptide acceptors may also possess a nucleophile, which may be, without limitation, an amino group, a hydroxyl group, or a sulfhydryl group. In addition, peptide acceptors may be composed of nucleotide mimetics, amino acid mimetics, or mimetics of the combined nucleotide-amino acid structure.
By an xe2x80x9caltered functionxe2x80x9d is meant any qualitative or quantitative change in the function of a molecule.
By xe2x80x9cbinding partner,xe2x80x9d as used herein, is meant any molecule which has a specific, covalent or non-covalent affinity for a portion of a desired DNA-protein fusion. Examples of binding partners include, without limitation, members of antigen/antibody pairs, protein/inhibitor pairs, receptor/ligand pairs (for example cell surface receptor/ligand pairs, such as hormone receptor/peptide hormone pairs), enzyme/substrate pairs (for example, kinase/substrate pairs), lectin/carbohydrate pairs, oligomeric or heterooligomeric protein aggregates, DNA binding protein/DNA binding site pairs, RNA/protein pairs, and nucleic acid duplexes, heteroduplexes, or ligated strands, as well as any molecule which is capable of forming one or more covalent or non-covalent bonds (for example, disulfide bonds) with any portion of a DNA-protein fusion.
By a xe2x80x9csolid supportxe2x80x9d is meant, without limitation, any column (or column material), bead, test tube, microtiter dish, solid particle (for example, agarose or sepharose), microchip (for example, silicon, silicon-glass, or gold chip), or membrane (for example, the membrane of a liposome or vesicle) to which an affinity complex may be bound, either directly or indirectly (for example, through other binding partner intermediates such as other antibodies or Protein A), or in which an affinity complex may be embedded (for example, through a receptor or channel).
The present invention provides methods for the creation of fusions between proteins and their encoding cDNAs. These constructs possess greatly enhanced chemical stability, first, due to the DNA component of the fusion and, second, due to the covalent bond linking of the DNA and protein moieties. These properties allow for easier handling of the fusion products and thereby allow selection and recognition experiments to be carried out under a range of reaction conditions. In addition, the present invention facilitates applications where a single-stranded nucleic acid portion is mandatory, for example, in hybridization assays in which the coding fusions are immobilized to a solid support. In addition, incubations may be performed under more rigorous conditions, involving high pH, elevated concentrations of multivalent metal ions, prolonged heat treatment, and exposure to various biological materials. Finally, single-stranded DNA is relatively resistant to secondary structure formation, providing a great advantage for techniques involving or requiring nucleic acid hybridization steps.
In addition, the methods of the present invention allow for the production of fusions involving DNA and protein components of any length, as well as fusion libraries of high complexity.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.