The present invention relates to a type of protein synthesis utilizing combinatorial translation of a single gene sequence interspersed with programmed ribosomal frameshifting sequences. Specifically this type of protein synthesis can be utilized to make numerous, varied proteins from a single gene. More particularly, libraries can be made utilizing this combinatorial protein synthesis and the libraries can then be used to screen for specific protein activities.
Unusual translational events, such as frameshifting, are known to play an important role in some diseases. For example, expression of the enzyme, reverse transcriptase, utilized by many retroviruses including HIV, involves a naturally occurring translational frameshift. Frameshifted gene expression is also thought to play a role in some forms of colon cancer, Alzheimer""s disease, and hemophilia A.
Rapid progress in sequencing genomes, as well as in genomics, proteomics and related disciplines has created a great number of targets for drug discovery and other potential treatments for various diseases. Specifically, phage libraries have proved invaluable in identifying peptide ligands of therapeutic value. For example, Yanofsky et al. (1996) described the isolation of a monomer peptide antagonistic to interleukin 1 (IL-1) with nanomolar affinity for the IL-1 receptor. Similarly, Wrighton et al. (1996) and Livnah et al. (1997) reported peptides that bind to the erythropoetin (EPO) receptor. Likewise Cwirla et al. (1997) described the identification of two families of peptides that bind to the human thrombopoietin (TPO) receptor. More particularly, there has been considerable progress in both construction and use of phage libraries over the last few years. For example, library diversity has been continually increasing, from 108 (Scott and Smith, 1991) to up to 1011 more recently. The number and types of protein molecules displayed as well as the types of libraries have increased, while the selection methods have also continued to improve (reviewed by Lowman, 1997).
One such improved selection method, called biopanning, is the selection of peptides or proteins with specific desired binding properties. In contrast to traditional sequence based discovery methods, biopanning enables screening rates that are 10,000 times faster. This technique involves the immobilization of a target protein on a solid phase, incubation of a phage library with the solid phase to allow phage binding to the target, followed by washes of unbound phage and elution of the bound phage. The eluted phage is grown in Escherichia coli. Typically, several rounds of biopanning (2 to 6) are performed to identify clones of interest. These clones of interest are capable of producing or expressing the protein which may have significant utility in the production of commercial therapeutic products, industrial proteins, research reagents or consumer protein enriched products.
Though several successes in drug discovery have arisen through use of a phage library, there are different types of libraries suitable for this type of search. For example, other than the phage library, researchers often rely on combinatorial libraries. The term xe2x80x9ccombinatorial libraryxe2x80x9d typically refers to libraries of biomolecules. Each element of the combinatorial library is composed of a string of several building blocks. The number of building blocks can be anywhere from 2 to 100 or even much more. Varying the building blocks at different positions within the string generates the diversity of the combinatorial library. Thus, if the string length is 5, and the number of building blocks is 10, the number of possible biomolecules in the combinatorial library is 105. This is the potential diversity of the library. The observed diversity, which is the number of biomolecules actually constructed, can be less or be equal to the potential diversity.
An example of a combinatorial library is the library of random peptides on filamentous phage. The typical length of the peptide in such a library is 15 amino acid residues, and the number of building blocks (amino acids) is 20. Thus, the potential diversity of the library is 2015=3xc3x971019. In practice, the number of Escherichia coli cells used to construct such a library acts as a limit on the actual diversity of the library. Thus, the observed diversity of the library is rarely above 1011.
There are many other examples of combinatorial libraries, including libraries of small organic molecules, phage libraries of antibodies, or libraries of genes obtained by DNA shuffling. Yet in all cases of gene libraries, be the library a phage library, a combinatorial library or another type of library, expression of a single gene in the library results in only one type (sequence) of protein being translated or produced.
Despite many successes in using phage display via libraries and biopanning, researchers still strive for even larger libraries and faster protocols in their search for a peptide or protein ideally suited for an envisioned task. Often, as part of this search, there is a need to screen a variety of proteins and ultimately select only those with the most desirable properties. As stated earlier, one problem with conventional libraries is that each gene encodes only one type of a protein, or corresponds to only one protein sequence. The following example illustrates this problem. Escherchia coli can make and store up to 107 protein molecules in a single cell. Researchers can engineer an Escherichia coli cell for expression of a single heterologous gene and that cell can then produce up to 70% (or even more) of that number of proteins (107) as identical copies of the protein encoded by that gene. This is highly desirable when large quantities of a protein are needed. However, if the researcher is attempting to make a large number of varied proteins for the purpose of screening the proteins to find the ones with the most desired properties, then having huge amounts of a single type of protein is inefficient and wasteful. As a result, there is a critical need in the art for a way to synthesize larger libraries of more varied proteins.
The present invention overcomes this difficulty by inserting ribosomal frameshifting sequences into the gene sequence of interest to cause combinatorial translation of the gene sequence so that the single sequence yields a multitude of different peptides. By causing the reading frame switch, ribosomal frameshifting sequences affect the amino acid sequence of the protein made by the ribosome. Therefore, insertion of a ribosomal frameshifting sequence into a gene causes that gene to code for significantly more than the traditional one peptide or protein.
The initial and most widely recognized presumption in considering DNA sequences for expression potential is the requirement for an open reading frame. Surprisingly, in a previous drug discovery protocol by the inventor, a large number of sequences expressed in a random peptide library were found to contain non-open reading frame (non-ORF) and frameshifted sequences (Carcamo et al., 1998). The study was designed to isolate peptides capable of binding to growth hormone binding protein (GHBP). Originally, in biopanning experiments for the specific protein targets, namely GHBPS, the inventor expected an open reading frame (ORF) corresponding to the full length of the peptide and an epitope tag that followed. However, the inventor was surprised to observe this class of sequence in only about 50% of all sequences identified in biopanning as binding to the target (Ravera et al., 1998). Even more surprisingly, the inventor observed two other types of sequences that were, qualitatively, very different from the sequence originally expected. These two sequences contained a frameshift in the +1 or xe2x88x921 direction but were, also unexpectedly, capable of expression.
One non-ORF phage clone, known as H10, which is capable of binding to the rat growth hormone binding protein (GHBP), was studied further. More specifically, a secondary peptide library containing random mutations of this sequence was constructed and panned against GHBP in an attempt to optimize and correct the reading frame (Carcamo et al., 1998). While the study did not correct the reading frame of the H10 clone, it did yield clones capable of binding to the GHBP. The major focus of this study was the regulation of receptor function with surrogate peptide ligands in attempt to aid in developing new therapies for diseases such as acromegaly and dwarfism. The discovery of expression of non-ORF clones was a surprising result of this study.
One advantage of the present invention is the ability to synthesize multiple, varied proteins from a single genetic sequence. Creating a library of phages carrying the programmed ribosomal frameshifting sequences, as in the present invention, allows the researcher access to a much larger, more varied library of proteins, capable of providing a more efficient search for a therapeutic protein, or any other protein of interest. Furthermore, a library of genes containing frameshifting sequences gives the researcher a tool for understanding the rules and mechanisms of gene expression, an understanding that is especially important in this era of genome sequencing. For the foregoing reasons, there is a critical need in the art for a method of synthesizing a variety of proteins from a single gene or genetic source.
Traditionally, a single gene codes for a single protein. In one aspect, the present invention provides a method for the combinatorial synthesis of proteins, allowing the synthesis of several proteins using the genetic information of just one gene. The present invention accomplishes combinatorial protein synthesis by utilizing nucleic acid sequences known as frameshifting sequences. Placing one or more of these frameshifting sequences within a recombinantly made gene allows the ribosome to switch reading frames during translation, thereby producing many types of proteins rather than a single type.
In another aspect, the present invention enables the construction of a new type of phage in which the phage-displayed protein is encoded by a gene having alternating frameshifting and random nucleic acid sequences. Thus, an advantage of the present invention is that a single E. coli cell is programmed to make significantly more than one type of recombinant polypeptide per cell. The design of such library is based on the finding that the ribosome is able to partially switch reading frames when translating sequences in phage libraries, and can do it with a high frequency in many types of clones (Carcamo et al., 1998; Goldman et al., 2000). The frame switch occurs on short frameshifting sequences, often involving a stop codon. Thus, instead of having one peptide made from a single gene, the cell can make several peptides by translating mRNA originating from a single gene and switching reading frames in the process. The number of possible translation routes grows exponentially with the number of the frameshifting sequences. The method of the present invention can be used to increase the diversity of protein libraries, select for novel proteins and to improve biological properties of proteins.
In another aspect, the present invention provides a method for detecting a protein from a library of combinatorial proteins by utilizing biopanning techniques to select or screen for the protein that binds to the target biomolecule.
I. Definitions
By xe2x80x9ccoat proteinxe2x80x9d is meant the protein which serves as an element of the outer surface of a phage. Coat proteins form a type of envelope around the phage particle and could be considered surface proteins.
By xe2x80x9ccomplementary DNA (cDNA)xe2x80x9d is meant any DNA molecule that is reverse transcribed from an RNA template.
By xe2x80x9cdetecting the presence of a peptidexe2x80x9d is meant screening or selecting for a peptide having predefined properties. The skilled artisan may choose between various technologies in screening or selecting for a peptide. Some examples of techniques that might be used include biopanning against a specific antibody, screening by sequencing, and the antibiotic resistance-based selection.
By xe2x80x9cDNAxe2x80x9d is meant a nucleic acid in which the sugar is deoxyribose as opposed to ribose in RNA. DNA is intended to include any nucleic acid which can be transcribed to yield RNA.
By xe2x80x9cenzymatic activityxe2x80x9d is meant the actions or results produced by an enzyme, which is understood as a complex protein capable of catalyzing specific biochemical reactions.
By xe2x80x9cexpressingxe2x80x9d is meant that the sequence encoded by the polynucleotide is transcribed and translated into the corresponding peptide, polypeptide or protein. It is further understood that expressing means the translated product may appear as a fusion product to the coat protein of the phage carrying the sequence, it may appear in the cytoplasm of the bacterial host cell or it may appear on the outer cell membrane of the bacterial host cell. It is also understood that in certain preferred embodiments, xe2x80x9cexpressingxe2x80x9d means only the transcription of the DNA into the complementary RNA sequence or only the translation of RNA into the complementary peptide.
By xe2x80x9cfilamentous phage vectorxe2x80x9d is meant a vector based on filamentous phage which is useful in phage display systems. By inserting the coding sequence of interest into the coat protein gene of the filamentous phage vector, the skilled artisan is able to construct a fusion protein that is expressed or displayed as part of the coat protein of the phage particle.
By xe2x80x9cfusion proteinxe2x80x9d is meant a protein produced from the translation of a genetic sequence and an inserted, or placed in the immediate vicinity, another genetic sequence.
By xe2x80x9cgene IIIxe2x80x9d is meant the gene encoding the minor coat protein of the M13 bacteriophage. The M13 bacteriophage is composed of 10 genes and gene III codes for the minor coat protein, also known as the pill protein.
By xe2x80x9cgenomic DNAxe2x80x9d is meant any DNA sequence found within a genome.
By xe2x80x9cheterologous promoterxe2x80x9d is meant a foreign or synthetic DNA sequence capable of initiating transcription of the downstream DNA sequence into the complementary RNA sequence. Foreign or synthetic simply means that the sequence is derived from another species or engineered by one skilled in the art.
By xe2x80x9cisolatingxe2x80x9d is meant separating the sequence, peptide, polypeptide or protein of interest from other cellular materials. Isolation of a sequence, peptide, polypeptide or protein of interest is a common and well understood technique in the art and there are many methods available to one skilled in the art to achieve isolation of a target molecule. For example, the skilled artisan may use selection techniques where only cells expressing the target molecule can survive, or a targeted protein may be isolated from a solution by taking advantage of its binding properties, molecular weight or some other unique feature of the target protein.
By xe2x80x9clibraryxe2x80x9d is meant a collection of biomolecules, such as peptides, some of which may exhibit biological activity or other activity.
By xe2x80x9cmessenger RNAxe2x80x9d is meant a nucleic acid in which the sugar is ribose as opposed to deoxyribose in DNA. Messenger RNA is intended to include any nucleic acid which can be entrapped by ribosomes and translated into protein. Further mRNA is understood to be the product of transcription and splicing.
By xe2x80x9cmicroorganismxe2x80x9d is meant any bacteria or bacteriophage capable of serving as host to the expression vector carrying the polynucleotide sequence of the present invention. The microorganism is also understood as being capable of amplifying the number of vectors carrying the polynucleotide sequence as well as expressing the peptides encoded by this sequence. In a preferred embodiment of the invention, Escherichia coli is used as the host microorganism. In the present invention, Escherichia coli is infected with phage carrying the polynucleotide sequence of the invention and the infection causes the Escherichia coli to produce and/or secrete the peptide fusion product of the present invention.
By xe2x80x9coperably linkedxe2x80x9d is meant that a regulatory sequence, for example a promoter, is connected to a polynucleotide or genetic sequence in such a way as to permit expression of the gene product under the regulatory control of the attached promoter sequence.
By xe2x80x9corigin of replicationxe2x80x9d is meant a polynucleotide sequence at which replication is initiated. Without an origin of replication, a vector or phage, cannot replicate and a colony of transformed cells could never be achieved. Origins of replication are commercially available, and one skilled in the art can use common techniques to insert a chosen origin of replication into a specific vector. One such technique might involve the use of restriction nucleases to form a specific cleavage and ligation site.
By xe2x80x9couter surface of a microorganismxe2x80x9d is meant the extra-cellular membrane.
By xe2x80x9cpeptidexe2x80x9d is meant any chain of amino acids, linked by a peptide bond, regardless of length or post-translational modifications such as glycosylation or phosphorylation. Peptide could also be interpreted to mean polypeptide, which only requires that there be more than two amino acids linked by a peptide. Further, the term peptide includes protein, polypeptide and peptides that can be expressed as translational products from mRNA.
By xe2x80x9cphagexe2x80x9d is meant a naturally occurring or engineered bacterial virus. Phage is not limited to a single type, but, for example, the phage of the invention could be engineered from any of the following known phage vectors including but not limited to; M13, lambda, Mu and P1. There are also different classes of phages suitable for construction of the phage of the present invention. More specifically, the filamentous phage is a preferred phage to display the combinatorial protein library of the present invention.
By xe2x80x9cphagemidxe2x80x9d is meant a filamentous phage vector the essential two elements of which are (a) the gene III, which produces the minor coat protein, pill, and (b) the phage origin of replication. One example of a phagemid that can be used to practice the present invention is the pCANTAB5E phagemid.
By xe2x80x9cpolynucleotidexe2x80x9d is meant a molecule, or a sequence, composed of more than one nucleotide and preferably more than 7 nucleotides, of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) type. The term should be understood to include, as equivalents, derivatives, variants and analogs of either RNA or DNA made from nucleotide analogs, and, as applicable to the embodiment being described, single (sense or antisense) and double-stranded polynucleotides. The polynucleotide of the present invention is understood to contain ribosomal frameshifting sequences interspersed between coding genetic sequences. Further it is understood that the polynucleotide sequence encodes sequences, which when in mRNA form are known as ribosomal binding sites. A ribosomal binding site is a nucleic acid sequence recognized by the small subunit of the ribosome as the site for initiation of translation of a nucleic acid sequence. The large subunit of the ribosome binds to the ribosomal binding site after the small subunit has bound itself. Still further it is understood that the unique attribute of the polynucleotide sequence is its ability to produce a plurality of peptides from a single gene or genetic sequence.
By xe2x80x9cplurality of peptidesxe2x80x9d is meant more than one peptide, where the peptides are composed of varying amino acid sequences.
By xe2x80x9cselectable markerxe2x80x9d is meant a gene insertion with a specific characteristic, used to distinguish cells that have taken up the vector from those cells that have not. For example, it is a well known technique in the art to insert an antibiotic resistance gene as a selectable marker. The cells harboring the vector are then grown on a medium containing the specific antibiotic. Thus only transformed cells, or those carrying the inserted gene and the selectable marker gene (antibiotic resistance gene) will be able to grow in the medium.
By xe2x80x9ctransformed microorganismxe2x80x9d is meant any microorganism expressing a peptide or protein from the polynucleotide insert of the expression vector that transformed the microorganism. There are several different techniques available to one skilled in the art for creating a transformed microorganism. Some of these techniques would include the use of plasmid, phage, or phagemid to insert the polynucleotide sequence of the invention into the genome of the transformed microorganism.
By xe2x80x9cvectorxe2x80x9d is meant a replicable nucleic acid construct, namely a plasmid, phage, phagemid or viral nucleic acid. Vectors may be used to amplify and/or express nucleic acid encoding a fusion peptide. In an expression vector, the nucleic acid sequence encoding a peptide of interest is operably linked to suitable control sequences capable of effecting expression of the peptide in a cell. The need for such control sequences will vary depending upon the cell selected and the transformation method chosen. Generally, control sequences include a transcriptional promoter, suitable mRNA ribosomal binding sites, and sequences that control the termination of transcription and translation. Methods that are well know to those skilled in the art can be used to construct expression vectors containing appropriate transcriptional or translational control signals. See for example, the techniques described in Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual (2nd Edition), Cold Spring Harbor Press, N.Y., which are herein incorporated by reference.
Other features and advantages of the invention will be apparent from the following detailed description of the invention and from the claims.