In general, the present invention features methods for the preparation of nucleic acid-protein conjugates.
Nucleic acid-protein conjugates, sometimes referred to as nucleic acid-protein fusions, nucleoproteins or nucleopeptides, are naturally-occurring bioconjugates which play a key role in important biological processes. In one particular example, such conjugates play a central role in the process of nucleoprotein-primed viral replication (Salas, Ann. Rev. Biochem. 60, 39-71 (1991)). Accordingly, nucleoproteins as well as nucleopeptides may serve as powerful tools for the study of biological phenomena, and may also provide a basis for the development of antiviral agents.
In addition, conjugates of peptides and nucleic acids have found use in several other applications, such as non-radioactive labels (Haralambidis et al., Nucleic Acids Res. 18, 501-505 (1990)) and PCR primers (Tong et al., J. Org. Chem. 58, 2223-2231 (1993)), as well as reagents in encoded combinatorial chemistry techniques (Nielsen et al., J.A.C.S. 115, 9812-9813 (1993)). In yet other applications, peptides predicted to have favorable interactions with cell membranes, such as polylysine (Leonetti et al., Bioconjugate Chem. 1, 149-153 (1990)), other highly basic peptides (Vives and Lebleu, Tetrahedron Lett. 328, 1183-1186 (1997)), hydrophobic peptides (Juby et al., Tetrahedron Lett. 32, 879-882 (1991)), viral fusion peptides (Soukchareun et al., Bioconjugate Chem. 6, 43-53 (1995)) and peptide signal sequences (Arar et al., Bioconjugate Chem. 6, 573-577 (1995)), have been coupled to oligonucleotides to enhance cellular uptake. Peptides able to chelate metals have also been appended to oligonucleotides to generate specific nucleic acid cleaving reagents (Truffert et al., Tetrahedron 52, 3005-3016 (1996)). And peptides linked to the 3xe2x80x2-end of oligonucleotides have been reported to provide important resistance to 3xe2x80x2-exonucleases (Juby et al., Tetrahedron Lett. 32, 879-882 (1991)).
One particular type of nucleic acid-protein conjugate, referred to as an 10. RNA-protein fusion (Szostak and, Roberts, U.S. Ser. No. 09/007,005; and Roberts""and Szostak, Proc. Natl. Acad. Sci. USA 94, 12297-12302 (1997)), has been used in methods for isolating proteins with desired properties from pools of proteins. To create such, fusions, an RNA and the peptide or protein that it encodes are joined during in vitro translation using synthetic RNA that carries a peptidyl acceptor, such as puromycin, at its 3xe2x80x2-end. In this process, the synthetic RNA, which is devoid of stop codons, is typically synthesized by in vitro transcription from a DNA template followed by 3xe2x80x2-ligation to a DNA linker carrying puromycin. The DNA template sequence causes the ribosome to pause at the 3xe2x80x2-end of the open reading frame, providing additional time for the puromycin to accept the nascent peptide chain and resulting in the production of the RNA-protein fusion molecule.
The present invention features chemical ligation methods for producing nucleic acid-protein conjugates in good yields. Two different approaches are described. In the first, fusions are formed by a reaction between an unprotected protein carrying an N-terminal cysteine and a nucleic acid carrying a 1,2-aminothiol reactive group. In the second approach, fusion formation occurs as the result of a bisarsenical-tetracysteine interaction.
Accordingly, in a first aspect, the invention features a method for generating a 5xe2x80x2-nucleic acid-protein conjugate, the method involving: (a) providing a nucleic acid which carries a reactive group at its 5xe2x80x2 end; (b) providing a non-derivatized protein; and (c) contacting the nucleic acid and the protein under conditions which allow the reactive group to react with the N-terminus of the protein, thereby forming a 5-nucleic acid-protein conjugate.
In a related aspect, the invention features a 5xe2x80x2-nucleic acid-protein conjugate which includes a nucleic acid bound through its 5xe2x80x2-terminus or a 5xe2x80x2-terminal reactive group to the N-terminus of a non-derivatized protein.
In various preferred embodiments of these aspects, the nucleic acid is greater than about 20 nucleotides in length; the nucleic acid is greater than about 120 nucleotides in length; the nucleic acid is between about 2-1000 nucleotides in length; the protein is greater than about 20 amino acids in length; the protein is greater than about 40 amino acids in length; the protein is between about 2-300 amino acids in length; the contacting step is carried out in a physiological buffer; the contacting step is carried out using a nucleic acid and a protein, both of which are present at a concentration of less than about 1 mM; the nucleic acid is DNA or RNA (for example, mRNA); the nucleic acid includes the coding sequence for the protein; the N-terminus of the non-derivatized protein is a cysteine residue; the N-terminal cysteine is exposed by protein cleavage; the reactive group is an aminothiol reactive group; the protein includes an xcex1-helical tetracysteine motif located proximal to its N-terminus; the xcex1-helical tetracysteine motif includes the sequence cys-cys-X-X-cys-cys SEQ. ID. NO: 6, wherein X is any amino acid; the reactive group is a bisarsenical derivative; the conjugate is immobilized on a solid support (for example, a bead or chip); and the conjugate is one of an array immobilized on a solid support.
In another related aspect, the invention features a method for the selection of a desired nucleic acid or a desired protein, the method involving: (a) providing a population of 5xe2x80x2-nucleic acid-protein conjugates, each including a nucleic acid bound through its 5xe2x80x2-terminus or a 5xe2x80x2-terminal reactive group to the N-terminus of a non-derivatized protein; (b) contacting the population of 5xe2x80x2-nucleic acid-protein conjugates with a binding partner specific for either the nucleic acid or the protein portion of the desired nucleic acid or desired protein under conditions which allow for the formation of a binding partner-candidate conjugate complex; and (c) substantially separating the binding partner-candidate conjugate complex from unbound members of the population, thereby selecting the desired nucleic acid or the desired protein.
In yet another related aspect, the invention features a method for detecting an interaction between a protein and a compound, the method involving: (a) providing a solid support that includes an array of immobilized 5xe2x80x2-nucleic acid-protein conjugates, each conjugate including a nucleic acid bound through its 5xe2x80x2-terminus or a 5xe2x80x2-terminal reactive group to the N-terminus.of a non-derivatized protein; (b) contacting the solid support with a candidate compound under conditions which allow an interaction between the protein portion of the conjugate and the compound; and (c) analyzing.the solid support for the presence of the compound as an indication of an interaction between the protein and the compound.
In various preferred embodiments of these methods, the method further involves repeating steps (b) and (c); the compound is a protein; the compound is a therapeutic; the nucleic acid is greater than about 20 nucleotides in length; the nucleic acid is greater than about 120 nucleotides in length; the nucleic acid is between about 2-1000 nucleotides in length; the protein is greater than about 20 amino acids in length; the protein is greater than about 40 amino acids in length; the protein is between about 2-300 amino acids in length; the nucleic acid is DNA or RNA (for example, mRNA); the nucleic acid includes the coding sequence for the protein, the N-terminus of the non-derivatized protein is a cysteine residue; the reactive group is an aminothiol reactive group; the protein includes an xcex1-helical tetracysteine motif located proximal to its N-terminus; the xcex1-helical tetracysteine motif includes the sequence, cys-cys-X-X-cys-cys SEQ. ID. NO: 6, wherein X is any amino acid; the reactive group is a bisarsenical derivative; the conjugate is immobilized on a solid support (for example, a bead or chip); and the conjugate is one of an array immobilized on a solid support.
As used herein, by a xe2x80x9c5xe2x80x2-nucleic acid-protein conjugatexe2x80x9d is meant a nucleic acid which is covalently bound to a protein through the nucleic acid""s 5xe2x80x2 terminus.
By a xe2x80x9cnucleic acidxe2x80x9d is meant any two or more covalently bonded nucleotides or nucleotide analogs or derivatives. As used herein, this term includes, without limitation, DNA, RNA, and PNA.
By a xe2x80x9cproteinxe2x80x9d is meant any two or more amino acids, or amino acid analogs or derivatives, joined by peptide or peptoid bond(s), regardless of length or post-translational modification. As used herein, this term includes, without limitation, proteins, peptides, and polypeptides.
By xe2x80x9cderivatizexe2x80x9d is meant adding a non-naturally-occurring chemical functional group to a protein following the protein""s translation or chemical synthesis. xe2x80x9cNon-derivatizedxe2x80x9d proteins are not treated in this manner and do not carry such non-naturally-occurring chemical functional groups.
By a xe2x80x9cphysiological bufferxe2x80x9d is meant a solution that mimics the conditions in a cell. Typically, such a buffer is at about pH 7 and may be at a temperature of about 37xc2x0 C.
By a xe2x80x9csolid supportxe2x80x9d is meant any solid surface including, without limitation, any chip (for example, silica-based, glass, or gold chip), glass slide, membrane, bead, solid particle (for example, agarose, sepharose, or magnetic bead), column (or column material), test tube, or microtiter dish.
By an xe2x80x9carrayxe2x80x9d is meant a fixed pattern of immobilized objects on a solid surface or membrane. As used herein, the array is made up of nucleic acid-protein fusion molecules (for example, RNA-protein fusion molecules). The array preferably includes at least 102, more preferably at least 103, and most preferably at least 104 different fusions, and these fusions are preferably arrayed on a 125xc3x9780 mm, and more preferably on.a 10xc3x9710 mm, surface.
By a xe2x80x9cpopulationxe2x80x9d is meant more than one molecule.
The present invention provides a number of advantages. For example, although conjugates of between 2-1000 nucleotides and 2-300 amino acids are preferred, nucleic acid-protein conjugates of any desired molecular weight may be generated using the methods of the invention because the nucleic acid as well as the protein may be produced independently using well-known synthetic and biological methods. These post-synthetic ligation methods are therefore advantageous over fully synthetic techniques where stepwise buildup of nucleic acid-peptide conjugates generally allows preparation of only limited size conjugates, typically of less than 20 nucleotides and less than 20 amino acids in length.
In addition the reactions described herein (for example, the reaction between the N-terminal cysteine and the 1,2-aminothiol reactive group on the nucleic acid) are chemoselective over other nucleophilic groups on the protein, thus leading to regiospecific links between proteins and nucleic acids. This contrasts with known methods for the synthesis of protein-nucleic acid conjugates which often rely on reactions between a nucleophilic group on the protein and an electrophile on the nucleic acid moiety (Bayard et al., Biochemistry 25, 3730-3736 (1986); Cremer et al., J. Prot. Chem. 11(5), 553-560 (1992)). In these reactions, multiple nucleophilic side chains on the protein compete for reaction with the electrophile leading to non-specific links between protein and nucleic acid and thus generating a heterogenous mixture of conjugate products.
In yet other advantages, the present ligation reactions work efficiently under mild conditions in physiological buffers. Consequently, protein structure is not disrupted under the ligation conditions used, and conjugates carrying functional proteins can be formed. In addition, the present ligation reactions work efficiently with reactant concentrations in the xcexcM range. Consequently, dilute preparations of protein and nucleic acid can be used for conjugate preparation.
The present techniques also provide advantages with respect to the conjugates themselves. Most notably, the conjugate nucleic acid (for example, RNA) is linked to the amino-terminus of the conjugate protein. This type of fusion leaves the protein""s carboxy-terminus unmodified and is particularly beneficial when the carboxy-terminal amino acids are involved with protein structure or function, or participate in interactions with other species.
In addition, with respect to RNA-protein fusions, efficient ligation in aqueous buffers at low concentrations of reactands allows the fusion of nascent proteins to their encoding RNAs while bound to the ribosome. Pretranslational 3xe2x80x2-modification of the mRNA as described for 3xe2x80x2-fusions (Szostak and Roberts, U.S. Ser. No. 09/007,005; and Roberts and Szostak, Proc. Natl. Acad. Sci. USA 94, 12297-12302 (1997)) is unnecessary, because the 3xe2x80x2-end of the mRNA is not involved in ligation. Moreover, because of the lack of involvement of the 3xe2x80x2-end of the RNA in ligation, the present technique facilitates the production of RNA-protein fusions using RNAs from a variety of sources. In one particular example, RNA (for example, mRNA) libraries with heterogeneous 3xe2x80x2-termini may be readily used for the synthesis of 5xe2x80x2-mRNA-protein fusions. In another example, cellular RNA may be used for fusion formation.
Finally, the present invention provides a quantitative advantage for the production of RNA-protein fusions by simplifying ribosome turnover and thereby optimizing fusion synthesis. In particular, because conjugate proteins are linked through their N-termini to conjugate nucleic acids, the fusion products are released in unhindered fashion from the native ribosome following translation, allowing free ribosomes to undergo further rounds of translation. This multiple turnover allows for the synthesis of larger pools of RNA-protein fusions than is currently available with single turnover at the ribosome (Szostak and Roberts, U.S. Ser. No. 09/007,005; and Roberts and Szostak, Proc. Natl. Acad. Sci. USA 94, 12297-12302 (1997)).
The nucleic acid-protein fusions (for example, the mRNA-protein fusions) of the invention may be used in any selection or in vitro evolution technique. For example, these fusions may be used in methods for the improvement of existing proteins or the evolution of proteins with novel structures or functions, particularly in the areas of therapeutic, diagnostic, and research products. In addition, 5xe2x80x2-RNA-protein fusions find use in the functional genomics field; in particular, these fusions (for example, cellular mRNA-protein fusions) maybe used to detect protein-protein interactions in a variety of formats, including presentation of fusion arrays on solid supports (for example, beads or microchips).
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.