1. Technical Field of the Invention
The present invention relates to protein detection and purification. More specifically the invention relates to protein recognition sites. Even more specifically the invention relates to the modified peptide sequence from the Semliki Forest Virus (SFV) encoded non-structural protein suitable for use as recognition site for recombinant non-structural protease of SFV (hereafter “SFV protease site”), the nucleotide sequence and its variants that encode the recognition site. The invention also extends to the SFV protease recognition site fused into a polypeptide, inserted into a polypeptide sequence or placed between any peptide or protein tag and polypeptide sequence. The invention also extends to the methods for using the SFV protease site and corresponding enzyme.
2. Background of the Invention
Site-specific proteolytic processing of expressed proteins is a widely used technical approach. This approach is used to remove unwanted sequences from expressed and purified recombinant proteins. Such unwanted sequences are often expression and/or purification tags; they can be peptide tags or protein tags. Peptide tags usually contain 4 to 20 amino acids, while protein tags usually have a molecule weight of some kDas. This approach is also used to process the multi-domain proteins into individual proteins both in vitro and in vivo (in living cells) conditions. Currently this approach is commonly used, but along with development of the methods of functional proteomics and methods for analysis of protein-protein interactions and protein functions directly in cell there is an increasing need for more precise, highly specific effective instruments.
Tagging (epitope tags, affinity tags, tags which stabilize the expressed protein or facilitate its correct folding in cells) is a widely used technology. Most of the proteins, currently used in biotechnology industry and for research purposes, are at some stage expressed as tagged fusion proteins since this allows using common and well established technologies for their detection, purification and concentration. However, because tags are usually immunogenic; because they can affect the protein structure and its ability to crystallize; because they can also mask the functional domains of recombinant protein and/or block specific and significant interactions with other proteins or cofactors, removal of the tags is an essential step before functional characterization of these recombinant proteins is possible. The tag removal is usually achieved by use of site specific processing with different proteases (thrombin, enterokinase, factor Xa, TEV (tobacco etch virus) protease and several others). Tag removal with proteases requires that the sequence encoding for protease recognition site has to be included in the expression vector. This sets certain limitations for protease recognition sites that can be used for this kind of vector type design:                sites should be relatively short;        sites should be cleaved only by specific protease;        sites should be cleaved with efficiency close to 100%.        
Since all these properties can not always be combined inside of one vector, one usually has to choose between different sets of vectors depending on the purposes: Vectors with maximally efficient cleavage site provide rapid and highly efficient cleavage. With this kind of cleavage sites in the vector the amount of substrate processed by one unit of enzyme is as high as possible. Vectors with maximally precise cleavage site provide cleavage to take place as close as possible to the N- or C-terminus of recombinant proteins. Thus, depending on the nature of experimental and/or technological setup using one and same enzyme different cleavage sites can provide different results.
For use of the cleavage in in vivo conditions additional requirements will apply. The protease used for these experiments must be highly specific, must not cause injuries of the cells and the cleavage should be highly efficient. One application of the in vivo cleavage is to affect on the expression protein stability by removing degradation signals from the protein or to cleave protein in such a way that the N-terminal amino acid residue of cleaved protein will be recognized by protein degradation machinery and the cleaved protein will be degraded by N-end rule. For this kind of approach either inducible cell lines, conditionally expressing the protease or high efficiency cell co-transfection systems would be beneficial.
The high importance of these problems has led to commercialization of set of enzymes with site specific protease activity and corresponding vector plasmids. The enzymes have different (cellular, viral) origins and include thrombin, enterokinase, factor Xa, TEV (tobacco etch virus) protease and several others. The list of enzymes used for these purposes is growing and the information of the enzymatic and structural properties is expanding. The ideal combination of protease and its recognition sequence should fulfill the following criteria:                high efficiency at wide diapason of conditions (temperature, ionic condition, pH);        high specificity for cleavage consensus, no secondary cleavages or side effects;        possibility to make cleavages precisely at the end (N- and/or C-terminus) of recombinant protein;        possibility to perform the reaction in vivo and in vitro;        existence of easy to use and reversible inhibitors of protease activity.        
In spite of the efforts to develop an ideal combination, so far none of the available protease/recognition site-combinations meets all the conditions of an ideal system as listed above. The present invention discloses a system that meets all these conditions and thereby introduces a novel, highly useful, precise and specific tool for site-specific proteolytic processing of proteins.
Semliki Forest virus (SFV) belongs to genus Alphavirus (family Togaviridae) together with 27 other known viruses. Alpha viruses infect their vertebrate hosts (mammals, birds and fish) and invertebrate transmission vectors (mosquitoes). In infected organisms the alpha viruses replicate in different cells to a high titer.
Alphavirus genome encodes for two protease activities—one is associated with virus coat protein which is an autoproteinase and another with non-structural protein nsP2, which cleaves three cleavage sites in alpha virus non-structural polyprotein P1234 (Merits et al., 2001, J. Gen Virol. 2001: 82:765–773). These cleavage sites have different consensus sequences and they differ from each other by the mode of proteolytic cleavage (in cis or in trans), the enzymatic activity required for the cleavage (intact nsP2 or protease domain of nsP2) and by the cleavage efficiency (Vasilieva et al., 2001: J. of Biol. Chem. 276(33): 30786–30793).
NsP2 consists of two enzymatically active domains: N-terminal NTPase/helicase/RNA triphosphatase domain and C-terminal cystein protease domain. Both domains are needed for virus replication and for processing of the second cleavage site in SFV polyprotein, while only the C-terminal protease domain is needed for processing the third cleavage site (between nsP3 and nsP4). Cysteine 481 and histidine 558 have been identified as essential residues for the protease activity of nsP2. It has been shown that nsP2 protease domain (hereafter named Pro39 ) can be expressed as recombinant protein in E. coli, purified with Ni-NTA chromatography and used for in vitro processing of the recombinant substrates, containing 37 aa region of the protease recognition site (19 aa residues upstream and 18 aa residues downstream of the cleavage point; hereafter 19/18 recognition site). (Vasiljeva et al. 2001). The cleavage is highly specific and active; Pro39 is capable to process 50% of 400-fold molar excess of substrate in 5 minutes. (Vasilieva et al., 2001). FIG. 1 illustrates the structure and processing pattern of SFV nonstructural polyprotein.
One of the biological functions of cleavage of the protease site between nsP3 and nsP4 proteins is to release the nsP4 from P1234 precursor protein and from alpha virus early replicase complex. SFV, in contrast to majority of alpha viruses analyzed to the date produces atypically large amounts of P1234 polyprotein; in case of most other alpha viruses the P1234 production is about 20 fold down-regulated by presence of leaky termination codon at the end of nsP3 region. This leads us to believe that compared to most alpha virus proteases the SFV nsP2 protease should have a higher cleavage activity for the last processing site, since it has to digest significantly higher amounts of substrate. It may also be that proteases from other alpha viruses may have similar high activities.