The sequencing of the human genome has created the promise and opportunity for understanding the function of all genes and proteins relevant to human biology and disease, Peltonen and McKusick, Science, 291: 1224-1229 (2001). However, several important hurdles must be overcome before this promise can be fully attained. First, even with the human genome sequence available, it is still difficult to distinguish genes and the sequences that control their expression. Second, although monitoring gene expression at the transcript level has become more robust with the development of microarray technology, a great deal of variability and control of function originates in post-transcriptional events, such as alternative splicing and post-translational processing and modification. Finally, because of the scale of human molecular biology, potentially many tens of thousands of genes, and their expression products, will have to be isolated and tested in order to understand their role in health and disease, Dawson and Kent, Annu. Rev. Biochem., 69: 923-960 (2000).
In regard to the issue of scale, the application of conventional recombinant methodologies for cloning, expressing, recovering, and isolating proteins is still a time consuming and labor-intensive process, so that its application in screening large numbers of different gene products for determining function has been limited. Recently, a synthesis approach has been developed which can address the need for facile access to highly purified research-scale amounts of protein for functional screening, Dawson and Kent (cited above) and Dawson et al., Science, 266: 776-779 (1994). In its most attractive implementation, an unprotected oligopeptide intermediate having a C-terminal thioester reacts with an N-terminal cysteine of another oligopeptide intermediate under mild aqueous conditions to form a thioester linkage which spontaneously rearranges to a natural peptide linkage, Kent et al., U.S. Pat. No. 6,184,344. The approach has been used to assemble oligopeptides into active proteins both in solution phase, e.g. Kent et al., U.S. Pat. No. 6,184,344, and on a solid phase support, e.g. Canne et al., J. Am. Chem. Soc., 121: 8720-8727 (1999). Recently, the technique has been extended to permit coupling of C-terminal thioester fragments to a wider range of N-terminal amino acids of co-reactant peptides by using a removable ethylthio moiety attached to the N-terminal nitrogen of the co-reactant, thereby mimicking the function of an N-terminal cysteine, Low et al., Proc. Natl. Acad. Sci., 98: 6554-6559 (2001).
Despite these advances, such peptide couplings have low yields as a result of undesired rearrangements between atoms in the side chain and those in the thioester moiety of an acidic C-terminal amino acid. This greatly limits the applicability of the generalized native ligation chemistries. Therefore, the field of protein synthesis would be advanced if the reasons for such low yields were understood and approaches were found to overcome current limitations in reaction yield. Surprisingly, the present invention provides such understanding and solutions.