1. Technical Field
The present invention is generally related to the field of methods for construction of random domain insertion libraries, and more specifically related to the field of methods for construction of a random insertion library with optimal control of composition and length of inter-domain linker residues and mediated by sticky-end ligation between host and guest DNA fragments.
2. Prior Art
The ability to create fusion proteins by connecting two or more protein domains is advantageous in protein engineering because it can introduce a wide range of novel and integrated functions. A mode of protein fusion which has been most extensively used and studied is end-to-end fusion [1-2]. In this method, the N-terminus of one protein is fused to the C-terminus of the other protein. While end-to-end fusion has successfully been used to create multi-functional, multi-domain proteins, insertional fusion has recently been recognized as a novel tool to produce integrated and coupled functionalities through inter-domain interactions [3-9].
In insertional fusion, a guest protein is inserted into the middle of a host protein through multiple tethers connecting the two protein domains [3-9]. Insertional fusion has resulted in creation of allosteric protein switches [3-5,7,9] and stabilization of a guest protein domain [6,8]. The close proximity of the N- and C-termini of a guest protein seems to increase the chance of successful insertion. Nearly 50% of single-domain proteins have their N- and C-termini proximal [10], indicating the potential application of insertional fusion to a wide range of proteins. Proteins whose two termini are more distal can also be functionally inserted with the introduction of appropriate linkers [11].
One of the most important factors to consider in insertional fusion is selection of insertion sites, which may determine inter-domain interactions. For insertion sites, loops of a host protein have been extensively used because these locations are usually tolerant of insertion of a large guest protein domain [6,8,12,13]. Insertion sites may also be rationally selected with the aid from computational structure modeling on protein insertion complexes [14,15]. However, there is no robust guideline for selection of insertion sites ensuring the desired functional outcome of insertional fusion. Moreover, construction of a protein insertion complex is usually time-consuming and, as such, testing of multiple insertion sites one by one is not always highly efficient. More comprehensive and systematic examination of insertion sites is possible through construction of random insertion libraries followed by high throughput evaluation of these libraries for functional outcomes [3,16,17]. In this combinatorial approach, a guest protein domain is randomly inserted into a host protein domain.
Characteristics of linker residues joining the fused protein domains can also play an important role in inter-domain interactions [3,16,18,19]. In protein fusion complexes, linkers are introduced to alleviate any steric conflicts between protein domains [20,21]. Relatively small and hydrophilic residues are usually preferred as inter-domain linker residues [3,22] as inclusion of these linker residues are likely to maintain structures and functions of the fused protein domains [19,22]. The amino acids preferentially found in naturally occurring inter-domain linkers include Arg, Asn, Asp, Gln, Glu, Gly, Lys, Pro, Ser and Thr [20,22]. As most inter-domain linkers are likely to be solvent-exposed, inclusion of bulky hydrophobic residues might be energetically destabilizing and/or cause undesired intramolecular or intermolecular hydrophobic interactions [22]. Flexible inter-domain linker residues such as Asp, Gly, Lys and Ser have been widely used for construction of engineered protein insertion complexes [3,6,7,16] and are effective in control of functional dynamics of domains [19]. It was found that composition rather than amino acid sequence of a linker determines linker flexibility [21]. Inclusion of multiple cysteine residues in linkers is undesired under most circumstances due to possible formation of unwanted disulfide bonds.
The presence of too short inter-domain linkers should be avoided in insertional fusion because of potential structural conflicts between the fused domains [3]. Inclusion of too long inter-domain linkers is also not preferred as the chance of inter-domain interactions, which are critical in functional integration of protein domains in insertional fusion, may be significantly reduced in this case [3]. The inter-domain linker length may also determine stability of a fusion protein [18,21]. The average length of naturally occurring inter-domain linkers corresponds to ˜5-6 residues [19,22,23]. Similarly, inclusion of 3-5 residue inter-domain linkers was found to be effective at conserving structures and functions of the fused domains in engineered protein insertion complexes [3,6,24]. Taken altogether, optimal control of amino acid composition and length of an inter-domain linker would be highly beneficial in constructing random insertion libraries with a high likelihood of functional integration of the fused domains.
In combinatorial engineering for protein insertional fusion, DNaseI is commonly used to randomly introduce single cuts in plasmids harboring a DNA sequence encoding a host protein [16,17,25]. After DNaseI digestion, the resultant overhang strands created within single cut linear plasmids are repaired by DNA polymerase to prepare blunt ends. The repaired linear plasmids are then blunt end-ligated with DNA encoding a guest protein. Unfortunately, generation of single cuts in target DNA using DNaseI is not always straightforward and instead requires the delicate control of DNaseI activities, which is often difficult to achieve [26]. Alternatively, one may use S1 nuclease to overcome this difficulty [26]. However, the nonspecific nuclease (e.g., DNaseI and S1 nuclease)-mediated DNA digestion results in occurrence of uncontrolled tandem duplication and/or deletion of a host DNA sequence on ends of an inserted guest DNA sequence [17,26,27]. While such tandem duplication and/or deletion may provide additional diversity in a library in terms of the distance between the two fused domains [17,27], control of amino acid composition and length of linkers is difficult in this case. DNA digestion for random insertion can also be made chemically [28] but this procedure is complicated and was found to generate unwanted mutations [28].
As an alternative, one may consider using a transposon, a DNA element that can be randomly inserted into a host DNA sequence with a high efficiency and accuracy [29], for construction of random domain insertion libraries [30]. Transposons are able to translocate to a variety of sites on DNA of any host organism [31]. Among many transposons, a bacteriophage Mu transposon has been extensively studied [29] and, because of its low target site preference [29], used to construct random domain insertion libraries [3]. The Mu transposon has a 22 bp symmetrical consensus sequence, located near both ends, for recognition by MuA transposase [31-32]. Random transposition of a Mu transposon into a target gene occur through (1) binding of MuA transposase monomers to the Mu transposon recognition sites to form transposome assemblies [31], (2) tetramerization of the bound MuA transposase monomers to bridge the ends of the Mu transposon and engage the Mu transposon cleavage sites (i.e., sequences containing (T or A)CA↓ located at 5 bps beyond the terminal recognition sites [33-34], (3) subsequent self-cleavage of the Mu transposon at the cleavage sites [35], and (4) accurate occurrence of a 5 bp staggered cut in a host DNA sequence into which the Mu transposon is subsequently incorporated [29,34,36,37].
High fidelity of the transposition mechanism results in precise 5 bp duplication of a host DNA sequence occurring upon transposition of the Mu transposon [29,31,36,37]. This is in contrast with the aforementioned uncontrolled occurrence of tandem nucleotide duplication and/or deletion found in random domain insertion mediated by nonspecific nucleases [17,26,27]. A gene of interest (i.e., a guest DNA sequence) may be included in a transposon along with other genetic components required for random transposition. However, direct application of a transposon in this manner is not optimal for control of inter-domain linker residues in random insertion libraries [38]. This is because necessary transposon components flanking a guest DNA sequence remain in the host DNA sequence after transposition, and encode suboptimal inter-domain linkers [9,38-40].
Control of inter-domain linker residues was previously found to be possible by removal of randomly inserted whole transposon elements from a host DNA sequence, which is then re-ligated with a guest DNA sequence encoding a guest protein flanked by desired inter-domain linker residues [3]. However, similar to nonspecific nuclease-based methods, construction of random domain insertion libraries in this manner relies on blunt-end ligation between host and guest DNA fragments, which is much less efficient than sticky-end ligation [41-43], thus lowering library construction efficiency. Blunt-end ligation causes not only recircularization of a host DNA fragment without a guest DNA fragment being inserted, but also inclusion of multiple guest DNA copies, decreasing library quality [3]. Moreover, the restriction enzyme site used in this previous study to remove randomly inserted transposons while causing minimal nucleotide deletion in a host DNA sequence is relatively abundant, limiting application of this method for other host DNA sequences [3].
Therefore, there is a need for an engineered transposon for facile construction of a random protein domain insertion library. It is to such a need and others that the present invention is directed.