The term natural product (NP) or secondary metabolite refers a group of various low molecular weight compounds produced by a microorganism whose production is not always essential, but is surely essential for the adaptation and survival thereof. This implies that the metabolites are not preserved and may even be specific to a single strain of bacterial specie. It also means that these compounds are produced under certain conditions, which tend to differ from laboratory conditions. The most common NP belong to the following categories: i) terpenoids and steroids (e.g. taxol); ii) alkaloids (eg. morphine; iii) substances derived from fatty acids (eg. prostaglandin E1, which is an eicosanoid), and polyketides (eg. erythromycin); iv) non-ribosomal peptides (such as penicillin), or small peptides aldehydes (such as leupeptin); and v) enzyme cofactors (such as cobalamin).
It has been established that the genes responsible for the synthesis of natural products in bacteria are grouped in discrete regions of their chromosomes, which implies that its regulation is mediated in a fine way to match the presence of suitable precursors and enzymes involved in a concerted way for the accelerated production of these compounds.
The genome mining of natural products is a strategy used in microbiology for the analysis of microbial genomes in order to predict their ability to produce new chemical compounds [1]. In addition, the use of genome mining tools helps establish the logical biosynthetic which occurs on the enzymatic transformations of the pathway, as well as predicts substrates and products in the case of uncharacterized microbial enzymes. This knowledge allows the use of genetic engineering methods of microorganisms for producing natural products efficiently and cost-effectively. This helps also to create strategies or methods to select and define the efficiency of metabolites, evaluate their activity against pathogens, explore ways to modify these metabolites to improve their efficiency, and elucidate whether the bioactivities of these metabolites have some relevance to medicine or industry [2].
Using these strategies, success is ultimately aimed at finding new types of drugs such as antibiotics, also to improve production methods and synthesis, and better ways to examine the effectiveness of antibiotics in humans, plants, and animals.
Once the genes that direct the synthesis of promising molecules are identified by genome mining, molecular biology techniques are used to establish the relationship between the production of the corresponding molecule and the biosynthetic genes. In the organism that synthesizes, it done through the comparative analysis of the metabolomes of mutant lacking genes essential for the production of molecules and the wild strain. The relationship gene-metabolite also can be established through the cloning of biosynthetic genes and their introduction in a host for heterologous production. The objective of these strategies is focused on discovering new types of molecules of pharmacological interest such as antibiotics, or inhibitors and pigments that can also be used in the industry, also to improve production methods and synthesis of known products and to improve the efficacy of drugs by searching their variants for use on humans, plants and animals.
The bacteria that most have been used for the production of NP's are species of the genus Streptomyces, Gram-positive bacteria, belonging to the family of actinobacterium, whose genomes have a content of 72% of G+C on average. The Streptomyces have been isolated from different habitats, manly from all types of soil and marine sediment. Most of them are free living saprophytes and they degrade organic matter of their habitats competing with a large number of species, so it is believed to they have evolved to produce a wide range of natural products for their survival [3].
Between the NPs produced by members of the genus Streptomyces are small peptide aldehydes (SPA), examples of them are leupeptin, antipain, and chymostatin, which are the protease inhibitors more used in industry and biotechnological research. Inhibiting protease enzymes, which are responsible for the degradation of peptides and exogenous or endogenous proteins, is vital for many biological functions. Therefore, proteases are considered promising targets for the development of therapies for treatment diseases where proteolysis is relevant; for example, diseases associated with defects in the functioning of proteasome, a protein complex responsible for degrading endogenous proteins; calpains, hyperactive proteases in conditions such as Alzheimer disease and cataract formation; and cathepsins, which have been linked to cancer and inflammatory diseases [4-6].
Proteases are also vitally important for various pathogenic agents during infectious processes, so it has explored the use of protease inhibitors to combat the human immunodeficiency virus, cytomegalovirus, among others [7-8]. In this context, the protease inhibitors, including those belonging to the large family of small peptide aldehyde, such as leupeptin (which has been called here SPAs) are being studied extensively for its development as therapeutic agents [4-6]. Moreover, aside from their potential therapeutic use, SPAs are widely used in industry and research laboratories for protein purification processes, wherein proteolysis is a counterproductive process that needs to be inhibited. The industry and the research and diagnostic laboratories are the most important market for these compounds, being leupeptin and antipain the most used. Practically all the processes of production of heterologous proteins, some of them with the highest value added such as next generation vaccines, involve the use of protease inhibitors like leupeptin or one of its derivatives. Therefore, these molecules are marketed both in bulk and in pure versions and are obtained through bacterial fermentation of the genus Streptomyces, being these the most widely used and more valuable products than the few synthetic variants able to be obtained.
The first natural products with anti-proteolytic activity belonging to this family were discovered in the late sixties of the last century, in fermentation extracts of bacteria of the genus Streptomyces. Their discovery was the result of traditional screening methods or screening for activity followed by isolation, purification and chemical characterization. The compounds belonging to this family have also been detected in other members of actinobacteria and other Gram positive bacteria of the genera Bacillus and Staphylococcus, cyanobacteria and fungi of the Ascomycota group.
Beside their peptide nature and low molecular weight, which ranges between 300 and 900 Daltons, SPAs share other chemical characteristics such as: (i) lack of N-terminal groups, because they are “protected” with acyl groups of one or more carbons, with ureido-amino acid groups leading to an acylated or aminoacylated end, which has a terminal carboxyl group; and (ii) the presence of a terminal aldehyde group derived from the modification of the carboxyl terminus of the peptide chain by a reductive process, which is responsible for the biological activity of the molecule. The aldehyde end interacts with the active sites of proteases forming hemiacetals or hemithioacetals with catalytic residues, often serins or cysteines, disrupting their functioning (FIG. 1) [5,9-11].
From this general structure, SPAs can be divided into two sub-classes, considering the characteristics of their functional groups: (i) those with a terminal group protected by an acyl group; e.g., flavopeptin, tyrostatin, tyropeptin, nerfilin, strepin, leupeptin, bacithrocin, thiolstatin and acetyl-leucine-arginal; and (ii) those wherein the N-terminal joins an ureido motive, which in turn is attached to an amino acid via an amidic bond; e.g., quimostatin (or chymostatin), MAPI, GE20372, antipain and elastatinal. This setting allows to alter the order of the peptide chain, where the ureido group acts as an adapter changing the order of the peptide, from N-terminal to C-terminal through C-terminal to C-terminal, resulting in peptides with chemical and biological characteristics different from traditional ribosomal peptides.
The size of the peptide chain may range from two to six residues, while the acyl groups may be from two up to nine carbon atoms, as shown in FIG. 3. Based on SPAs whose structures have been determined, the residues or amino acids arginine, phenylalanine, tyrosine, leucine, isoleucine, valine and glutamine are recognized forming peptide chains. In most SPAs, the aldehyde group is derived from the carboxylic group of phenylalanine, tyrosine or arginine, while the next amino acid is typically any of the branched-chain group, either isoleucine, leucine or valine.
Outside this classification, some exceptions have been reported for example, elastatin, which consists of isovaleryl-ureido-arginine-glutamine-alanilal, and two of the smaller SPAs that are known to date: bacitrocins and thiolstatin, both inhibitors of cysteine/serine proteases produced by bacteria of the genus Bacillus. These peptides, which are smaller than the commonly found in SPAs, are formed by acyl-phenylalanine-arginal groups [12-13].
Regarding the biosynthesis of SPAs, since the discovery of leupeptin in 1969 [14-15], a number of studies have described the isolation of new SPAs and the taxonomic identification of the microorganisms that produce them, their fermentation and purification methods, chemical structures and biological activity (FIG. 2). However, despite the enormous importance of these compounds, little is known of their biosynthesis, including the compounds with a broad market, such as antipain, quimostatin and leupeptin.
Early efforts to characterize the biosynthesis of leupeptin were based on the fractionation and purification of protein extracts of Streptomyces roseus, the producing organism of leupeptin, and the use of these extracts for enzyme assays in vitro. These studies suggested that a non-ribosomal peptide synthase (NRPS) and a reductase would be involved in the pathway synthesis of this compound. These studies determined the incorporation of L-Leucine, D-Leucine, and acetyl-CoA as precursors [16]. Recently, the synthesis pathway of flavopeptin has been described, an aldehyde peptide with protease inhibitory activity [17]. Flavopeptin is synthesized by a NRPS whose domains are organized according to the order in which the precursors are incorporated into the final structure; i.e., it is colinear. Synthetase flavopeptin includes a transference domain of acyl groups, which is responsible for the incorporation of the acylation of the sixth amino acid of the peptide (N-acyl terminal). The synthase also includes adenylation domains, carrying proteins of peptidiles and condensation domains for the successive incorporation of six precursor amino acids, Ile-Gln-Ile-Gln-Val/Ile-Phe, (SEQ ID NO: 8 and SEQ ID NO: 9) an epimerization domain that acts on the fourth residue (Gln) and finally a reductase domain, which catalyzes the last step of the pathway consisting in releasing the nascent peptide from the synthase by reducing the terminal carboxylic group, which results in the formation of the characteristic aldehydic group (FIG. 4).
As already mentioned, most of the protease inhibitors on the market are natural products of microbial origin. Generally, the use of these compounds to prevent proteolysis consists in their addition during the protein extraction process, which implies that the compound should be fermented and purified for later use. Furthermore, it is known that the products obtained by fermentation are usually mixtures of related molecular species, and they show improved biological efficiency compared with 100% pure synthetic products, which is reflected in the higher cost of fermentation products.
An economically favorable alternative for the realization of this alternative process is the production of protease inhibitors simultaneously to the production of value-added proteins through the heterologous expression of the biosynthetic pathway of a protease inhibitor in an organism, which in turn produces the protein of interest. However, this has not been reported to date, most likely because of the lack of knowledge of the genetic bases that direct the synthesis of most SPAs and the difficulties encountered in the heterologous expression of NRPs encoded in large genetic regions (>20 Kbp judging by the chemical structure of three or more amino acids). To develop a system with these characteristics it is therefore necessary to know the genetic basis of the biosynthesis inhibitor to be expressed and the construction of genetic systems that allow the heterologous expression in a controlled manner of said genes into the cell line used for biotechnological purposes.
From the above, the following main problems are derived that hinder the development of these expression systems: (i) the almost total lack of knowledge of the genetic basis of the biosynthesis of protease inhibitors of the SPAs-type, including leupeptin, antipain and quimostatin; and (ii) based on the historic biochemical studies and the recent report of flavopeptin, it is expected that the synthases, which could direct their synthesis, involve complex biosynthetic systems coded by large genetic regions (>20 Kb): said genetic systems could hardly be expressed heterologously with efficiency in the cell lines used by the industry to produce high-value proteins.
Among the patents that relate to the obtainment and uses of SPAs having market value, the European patent EP1318198 describes a process for producing a recombinant peptide, which involves the addition of an inhibitor of chymotrypsin to the culture medium. The patent U.S. Pat. No. 4,066,507 describes a process for producing L-leupeptins, while the patent US20110183915 relates to treatments against cancer cells using a small molecule (leupeptin) to cause necrosis in them, but does not affect normal cells. As can be seen, because of its high value and potential, it is necessary to continue with the determination of the genetic bases and biosynthetic mechanisms involved in the production of SPAs.