Collagen is the main structural protein in the extracellular space of connective tissues of animal bodies and is the single most abundant protein in the animal kingdom. It is also one of the most useful biomaterials with numerous known applications in the medical, dental, and pharmacological fields. For example, collagen can be prepared as cross-linked, compacted solids or as lattice-like gels. They have been used as wound dressing, drug delivery systems, and sponges, just to name a few. Because of its versatility, naturally occurring collagen has been a source of inspiration for biomimetic designs to extend the range of its usefulness. Over the past several decades, intense research efforts had aimed at developing a molecular level understanding of collagen's self-assembly properties to further the development of designed materials with superior properties.
Naturally occurring collagen is a supramolecular complex made up of three collagen polypeptides. Historically, a great deal of our understanding about molecular and supramolecular structures came from speculative model building by pioneers such as Pauling, Watson, and Crick. One guiding principle for these pioneering model building is the maximization or correct pairing of inter- and intra-chain hydrogen bonds in biopolymers. Biomolecular structures are governed by a delicate balance of non-covalent intra- and intermolecular interactions, and hydrogen bonding is one of the most ubiquitous non-covalent interactions in nature. Together, these interactions drive macromolecular assembly and intermolecular recognition events that are critical to all life processes.
In the case of collagen, the triple helix is often composed of two identical polypeptide chains (α1) and an additional polypeptide chain (α2) that differs slightly in its chemical composition. Each of the three polypeptide chains adopts a left-handed helix conformation. When the three strands are mixed together, they can self-assemble into a right-handed triple helix depending on length and sequence of the polypeptide chain. In biological systems the production of collagen is more complex, involving translation of a pre-pro-peptide, N-terminal processing of the pre-pro peptide to pro-collagen in the endoplasmic reticulim, extensive posttranslational processing of the amino acid sidechains followed by glycosylation with monosaccharides, transport to the Golgi apparatus for modification with oligosaccharides and eventual packaging into secretory vesicles that are transported to the extracellular environment where further processing of procollagen leads to tropocollagen in certain forms. Further extracellular oxidation and various modifications eventually lead to the formation of collagen fibrils. In nearly all naturally occurring collagen peptides, every third residue is a glycine. Mutations in the strictly conserved glycine form the molecular basis for many debilitating human diseases such as osteogenesis imperfecta. The periodic spacing of the glycine residue at every third amino acid position in conjunction with the one residue stagger allows for a tightly packed triple helix with a repeating cross-strand hydrogen bond network. In other words, collagen peptides most often have a repeating C-(XaaYaaGly)n-N motif, with exceptions often leading to a number of human diseases. The most common amino acids in the variable Xaa and Yaa positions are (2S)-proline (Pro, 28%) and (2S,4R)-4-hydroxyproline (Hyp, 38%) although the Xaa and Yaa positions can vary dramatically with any amino acid occupying either position. Alternate amino acid sequences, where Xaa and Yaa are not Pro or Hyp can represent recognition domains for important protein-collagen interactions such as those with integrins and matrix remodeling enzymes (matrix metalloprotein 1, cathepsin K, and von Willibrand factor to name a few) involved in normal homeostasis and human disease states (cancer biology, genetic disease, various musculoskeletal disease, etc.). One of the most common triplet amino acid sequences in collagen is ProHypGly (10.5%).
Given that collagen is made up of three polypeptide chains, design of collagen mimetic material can theoretically be achieved by side-chain modification or backbone modification. Prior efforts to create biomimetic collagen have found that side-chain modification was a successful approach, but limiting as far as preserving the overall natural surface features and topology of collagen. When collagen peptides are modified with unnatural amino acid side-chain residues, they generally are able to retain the ability to self-assemble into triple helices although sometimes with decreased stability depending on the modification. It was observed that the stability of the triple helix depends on a delicate balance of noncovalent interactions, hence, side-chain modifications had the effect of modulating the stability of the triple-helical structure although this often necessitates changes to the structure that result in dramatically different surface features that could be limiting in terms of recognition interactions with biologically relevant environments and biomacromolecules involved in protein-collagen interactions.
In contrast, efforts to modify collagen backbone had been largely unsuccessful, with the limited exception of a peptoid residue developed by Goodman and co-workers, although this could be considered a form of side chain addition.1 In particular, the strictly conserved glycine residue in collagen peptides has remained largely intolerant to substitution, barring a recent thioamide substitution by Raines and coworkers.2 
Numerous attempts to modify the collagen backbone had been tried, including stereochemical inversion, heteroatom replacement, and homologation, all of which resulted in either severe destabilization or a complete lack of triple helix formation in collagen model peptide systems. Raines and Miller demonstrated that substituting the glycine amide into either an ester or a trans alkene greatly destabilized the triple helical structure.3 More recently, Etzkorn et al. demonstrated that substitution of any amide bond with (E)-alkene, regardless of whether it is involved in interchain hydrogen bonding, prevents formation of the triple helix even though the trans alkene locks the pseudo amide bond in the trans conformation.4 Backbone modifications in the form of stereochemical inversion (L to D amino acids) and heteroatom replacement have all resulted in either severe destabilization or a complete lack of triple helix formation.5 Amide-to-ester substitutions have a detrimental effect on collagen triple helix stability and many other protein secondary structures. In addition, trans alkene amide bond isosteres greatly destabilize the triple helical structure of collagen irrespective of positioning and involvement in hydrogen bonding. To date, these efforts have demonstrated a general intolerance of the collagen peptide backbone for molecular editing.
Despite these hurdles, discovering stabilizing backbone substitutions would provide significant opportunities for extending the properties and functions of biomimetic collagen. For example, there may be times when side-chain modification is not desirable and backbone modification is the only route to achieve designed material. The ability to stabilize the collagen peptide triple helical structure at the core while preserving the surface features of the natural amino acids opens the possibility for materials that interact with natural proteins in a way that perfectly mimics natural protein-collagen interactions. Other potential applications may include self-assembly of shorter collagen peptides into stable triple helical assemblies that could be used as multivalent scaffolds in applications ranging from high-payload drug carriers to organized multichromophore assemblies for light harvesting and photonic materials applications as well as protein-protein interactions (PPIs) and collagen mini-proteins that could have therapeutic potential.
Protein-protein interactions (PPIs) are involved in nearly all biological processes, including cell proliferation, growth, differentiation, and apoptosis. Stringent regulation of these biomolecular interfaces is essential for cellular function, making them attractive targets for the development of new therapeutics and biological probes. While a number of strategies have been applied to modulate these interfacial interactions including miniature proteins and peptidomimetics, this is extremely challenging due to the lack of natural partners and the high level of adaptability of protein-protein binding sites. It is also difficult to target PPIs because their interfacial surfaces are very large, shallow, flat, and often do not have well-defined pockets, unlike many enzymes. The secondary structure at the interface of PPIs (often characterized by α-helices or other common secondary structure motifs) has been the focus for rational design approaches. The most accurate way to mimic these α-helical interfaces is to use peptides consisting of α-amino acids (α-peptides). Hydrocarbon staples, hydrogen bond surrogates (HBS), β-peptides, miniature proteins, peptoids, and many other scaffolds have been successful in improving the stability and bioavailability of these peptides. Similar to protein-protein interactions that involve α-helix recognition, there are a multitude of interactions involving collagen triple-helix recognition (FIG. 1). Notably, wound repair is characterized by dynamic reciprocity, defined as ongoing, bidirectional interactions between cells and their surrounding microenvironment (particularly the extracellular matrix (ECM)). Thus, identification of key matrix components and mechanisms that direct dynamic reciprocity to promote a regenerative healing response will aid in the development of novel therapeutics for a range of maladies including cancer fibrosis, diabetes, and neurodegenerative disease.
Moreover, there is no general way to modulate collagen-protein interactions and much of the fundamental biomolecular recognition details are still unknown. Specifically, to the best of our knowledge, no interactions between heterotrimeric collagen triple helices and proteins have yet been characterized, although recent advances in synthetic peptide chemistry have helped to work toward this goal. This gap in knowledge exists because of a lack of chemical tools and due to the complex nature of collagen. Specifically, to the best of our knowledge, no interactions between heterotrimeric collagen triple helices and proteins have yet been characterized. This gap in knowledge exists because of a lack of chemical tools and due to the complex nature of collagen.
Many of the primary hurdles in collagen peptide design arise from difficulties in obtaining short, stable collagen mimetic peptides. Longer peptides are inherently more complex, expensive, and time-intensive to synthesize, while shorter collagen peptides suffer from the inability to self-assemble into the triple helical form at reasonable temperatures (25-37° C.). The complex purification and sterilization processes involved in deriving collagen peptides from animal sources can also generate low yields and diminish the mechanical and chemical functionality of the peptide in addition to the problem of separation from a complex heterogeneous mixture. Simple, precisely defined, collagen peptides that retain the capacity to self-assemble into triple helical structures and higher order materials would open the door for the design of new classes of chemical probes and potential therapeutics such as next generation wound healing agents for example. In addition to potential therapeutic applications there are a vast number of fundamental applications for modulating the collagen-protein interactome.
Therefore, there exists a need for backbone-modified biomimetic collagens and general methods for designing and making biomimetic materials and molecular mimics as well as a need to develop new classes of protein-protein interaction (PPI) modulators to broadly target collagen-protein interfaces by mimicking the triple-helix.