Within the last decade there has been a dramatic increase in the need for bioactive compounds with novel activities. This demand has arisen largely from changes in worldwide demographics coupled with the clear and increasing trend in the number of pathogenic organisms that are resistant to currently available antibiotics. For example. while there has been a surge in demand for antibacterial drugs in emerging nations with young populations, countries with aging populations, such as the US. require a growing repertoire of drugs against cancer, diabetes, arthritis and other debilitating conditions. The death rate from infectious diseases has increased 58% between 1980 and 1992 and it has been estimated that the emergence of antibiotic resistant microbes has added in excess of $30 billion annually to the cost of health care in the US alone. As a response to this trend, pharmaceutical companies have significantly increased their screening of microbial diversity for compounds with unique activities or speciflcities.
There are several common sources of lead compounds (drug candidates), including natural product collections, synthetic chemical collections, and synthetic combinatorial chemical libraries, such as nucleotides, peptides, or other polymeric molecules. Each of these sources has advantages and disadvantages. The success of programs to screen these candidates depends largely on the number of compounds entering the programs, and pharmaceutical companies have to date screened hundred of thousands of synthetic and natural compounds in search of lead compounds. Unfortunately, the ratio of novel compounds to previously-discovered compounds has diminished with time. The discovery rate of novel lead compounds has not kept pace with demand despite the best efforts of pharmaceutical companies. There exists a strong need for accessing new sources of potential drug candidates.
The majority of bioactive compounds currently in use are derived from soil microorganisms. Many microbes inhabiting soils and other complex ecological communities produce a variety of compounds that increase their ability to survive and proliferate. These compounds are generally thought to be nonessential for growth of the organism and are synthesized with the aid of genes involved in intermediary metabolism hence their name—secondary metabolites. Secondary metabolites that influence the growth or survival of other organisms are known as bioactive compounds and serve as key components of the chemical defense arsenal of both micro- and macroorganisms. Humans have exploited these compounds for use as antibiotics, antiinfectives and other bioactive compounds with activity against a broad range of prokaryotic and eukuryotic pathogens. Approximately 6,000 bioactive compounds of microbial origin have been characterized, with more than 60% produced by the gram positive soil bacteria of the genus Streptomyces. Of these, a least 70 are currently used for biomedical and agricultural applications. The largest class of bioactive compounds, the polyketides, include a broad range of antibiotics, immunosuppressants and anticancer agents which together account for sales of over $5 billion per year.
Despite the seemingly large number of available bioactive compounds, it is clear that one of the greatest challenges facing modern biomedical science is the proliferation of antibiotic resistant pathogens. Because of their short generation time and ability to readily exchange genetic information, pathogenic microbes have rapidly evolved and disseminated resistance mechanisms against virtually all classes of antibiotic compounds. For example, there are virulent strains of the human pathogens Staphylococcus and Streptococcus that can now be treated with but a single antibiotic, vancomycin, and resistance to this compound will require only the transfer of a single gene, vanA, from resistant Enterococcus species for this to occur. When this crucial need for novel antibacterial compounds is superimposed on the growing demand for enzyme inhibitors, immunosuppressants and anti-cancer agents, it becomes readily apparent why pharmaceutical companies have stepped up their screening of microbial diversity for bioactive compounds with novel properties.
The approach currently used to screen microbes for new bioactive compounds has been largely unchanged since the inception of the field. New isolates of bacteria, particularly gram positive strains from soil environments, are collected and their metabolites tested for pharmacological activity. A more recent approach has been to use recombinant techniques to synthesize hybrid antibiotic pathways by combining gene subunits from previously characterized pathways. This approach, called combinatorial biosynthesis has focused primarily on the polyketide antibiotics and has resulted in a number of structurally unique compounds which have displayed activity. However, compounds with novel antibiotic activities have not yet been reported, an observation that may be due to the fact that the pathway subunits arc derived from those genes encoding previously characterized compounds. Dramatic success in using recombinant approaches to small molecule synthesis has been recently reported in the engineering of biosynthetic pathways to increase the production of desirable antibiotics.
There is still tremendous biodiversity that remains untapped as the source of lead compounds. However, the currently available methods for screening and producing lead compounds cannot be applied efficiently to these under-explored resources. For instance, it is estimated that at least 99% of marine bacteria species do not survive on laboratory media, and commercially available fermentation equipment is not optimal for use in the conditions under which these species will grow, hence these organisms are difficult or impossible to culture for screening or re-supply. Recollection, growth, strain improvement, media improvement and scale-up production of the drug-producing organisms often pose problems for synthesis and development of lead compounds. Furthermore, the need for the interaction of specific organisms to synthesize some compounds makes their use in discovery extremely difficult. New methods to harness the genetic resources and chemical diversity of these untapped sources of compounds for use in drug discovery are very valuable. The present invention provides a path to access this untapped biodiversity and to rapidly screen for activities of interest utilizing recombinant DNA technology. This invention combines the benefits associated with the ability to rapidly screen nature with the flexibility and reproducibility afforded with working with the genetic material of organisms.
The present invention allows one to identify genes encoding bioactivities of interest from complex environmental gene expression libraries, and to manipulate cloned pathways to evolve recombinant small molecules with unique activities. Bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are involved in related processes. The genes are clustered, in structures referred to as “gene clusters,” on a single chromosome and are transcribed together under the control of a single regulatory sequence, including a single promoter which initiates transcription of the entire cluster. The gene cluster, the promoter, and additional sequences that function in regulation altogether are referred to as an “operon” and can include up to 20 or more genes, usually from 2 to 6 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually as to their function. Gene clusters are of interest in drug discovery processes since product(s) of gene clusters include, for example, antibiotics, antivirals, antitumor agents and regulatory proteins.
Some gene families consist of one or more identical members. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication is generated of adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Sometimes no significance is discernable in a repetition of a particular gene. A principal example of this is the expressed duplicate insulin genes in some species, whereas a single insulin gene is adequate in other mammalian species.
Gene clusters undergo continual reorganization and thus, the ability to create heterogeneous libraries of gene clusters from, for example, bacterial or other prokaryotic sources is valuable in determining sources of novel bioactivities, including enzymes such as, for example, the polyketide synthases that are responsible for the synthesis of polyketides having a vast array of useful activities.
Polyketides are molecules which are an extremely rich source of bioactivities, including antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many polyketides (produced by polyketide synthases) are valuable as therapeutic agents. Polyketide synthases (PKSs) are multifunctional enzymes that catalyze the biosynthesis of a wide variety of carbon chains differing in length and patterns of functionality and cyclization. Despite their apparent structural diversity, they are synthesized by a common pathway in which units derived from acetate or propionate are condensed onto the growing chain in a process resembling fatty acid biosynthesis. The intermediates remain bound to the polyketide synthase during multiple cycles of chain extension and (to a variable extent) reduction of the -ketone group formed in each condensation. The structural variation between naturally occurring polyketides arises largely from the way in which each PKS controls the number and type of units added, and from the extent and stereochemistry of reduction at each cycle. Still greater diversity is produced by the action of regiospecific glycosylases, methyltransferases and oxidative enzymes on the product of the PKS.
Polyketide synthase genes fall into gene clusters. At least one type (designated type I) of polyketide synthases have large size genes and encoded enzymes, complicating genetic manipulation and in vitro studies of these genes/proteins. Progress in understanding the enzymology of such type I systems has previously been frustrated by the lack of cell-free systems to study polyketide chain synthesis by any of these multienzymes, although several partial reactions of certain pathways have been successfully assayed in vitro. Cell-free enzymatic synthesis of complex polyketides has proved unsuccessful, despite more than 30 years of intense efforts, presumably because of the difficulties in isolating fully active forms of these large, poorly expressed multifunctional proteins from naturally occurring producer organisms, and because of the relative lability of intermediates formed during the course of polyketide biosynthesis. In an attempt to overcome some of these limitations, modular PKS subunits have been expressed in heterologous hosts such as Escherichia coli and Streptomyces coelicolor. Whereas the proteins expressed in E. coli are not fully active, heterologous expression of certain PKSs in S.coelicolor resulted in the production of active protein. Cell-free enzymatic synthesis of polyketides from PKSs with substantially fewer active sites, such as the 6-methylsalicylate synthase, chalcone synthase, tetracenomycin synthase, and the PKS responsible for the polyketide component of cyclosporin, have been reported.
Hence, studies have indicated that in vitro synthesis of polyketides is possible, however, synthesis was always performed with purified enzymes. Heterologous expression of genes encoding PKS modular subunits have allowed synthesis of functional polyketides in vivo, however, there are several challenges presented by this approach, which had to be overcome. The large size of modular PKS gene clusters (>30 kb) make their manipulation on plasmids difficult. Modular PKSs also often utilize substrates which may be absent in a heterologous host. Finally, proper folding, assembly, and posttranslational modification of very large foreign polypeptides are not guaranteed.
The present invention further relates to a method for discovering molecules which affect the interaction of proteins or other molecules in in vivo or in vitro systems through the use of fused genes encoding hybrid proteins or fused molecules capable of generating or inhibiting, or causing the generation of or inhibition of, a detectable signal.
The analysis of interactions between proteins and/or other molecules is a fundamental area of inquiry in biology. For instance, ligand:receptor interactions and the receptor/effector coupling mediated by Guanine nucleotide-binding proteins (G-proteins) are of interest in the study of disease. A large number of G protein-linked receptors funnel extracellular signals as diverse as hormones, growth factors, neurotransmitters, primary sensory stimuli, and other signals through a set of G proteins to a small number of second-messenger systems. The G proteins act as molecular switches with an “on” and “off” state governed by a GTPase cycle. Mutations in G proteins may result in either constitutive activation or loss of expression mutations. Given the variety of functions subserved by G protein-coupled signal transduction, it is not surprising that abnormalities in G protein-coupled pathways can lead to diseases with manifestations as dissimilar as blindness, hormone resistance, precocious puberty and neoplasia. G-protein-coupled receptors are extremely important to drug research efforts. It is estimated that up to 60% of today's prescription drugs work by somehow interacting with G protein-coupled receptors. However, these drugs were developed using classical medicinal chemistry and without a knowledge of the molecular mechanism of action. A more efficient drug discovery program could be deployed by targeting individual receptors and making use of information on gene sequence and biological function to develop effective therapeutics. The present invention allows one to, for example, study molecules which affect the interaction of G proteins with receptors, or of ligands with receptors.
Proteins are complex macromolecules made up of covalently linked chains of amino acids. Each protein assumes a unique three dimensional shape determined principally by its sequence of amino acids. Many proteins consist of smaller units termed domains, which are continuous stretches of amino acids able to fold independently from the rest of the protein. Some of the important forms of proteins are enzymes, polypeptide hormones, nutrient transporters, structural components of the cell, hemoglobins, antibodies, nucleoproteins, and components of viruses.
Protein-protein interactions enable two or more proteins to associate. A large number of non-covalent bonds form between the proteins when two protein surfaces are precisely matched, and these bonds account for the specificity of recognition. Protein-protein interactions are involved, for example, in the assembly of enzyme subunits; in antigen-antibody reactions, in forming the supramolecular structures of ribosomes, filaments and viruses; in transport; and in the interaction of receptors on a cell with growth factors and hormones. Products of oncogenes can give rise to neoplastic transformation through protein-protein interactions. For example, some oncogenes encode protein kinases whose enzymatic activity on cellular target proteins leads to the cancerous state. Another example of a protein-protein interaction occurs when a virus infects a cell by recognizing a polypeptide receptor on the surface, and this interaction has been used to design antiviral agents.
Protein-protein interactions have been generally studied in the past using biochemical techniques such as cross-linking, co-immunoprecipitation and co-fractionation by chromatography. A disadvantage of these techniques is that interacting proteins often exist in very low abundance and are, therefore, difficult to detect. Another major disadvantage is that these biochemical techniques involve only the proteins, not the genes encoding them. When an interaction is detected using biochemical methods, the newly identified protein often must be painstakingly isolated and then sequenced to enable the gene encoding it to be obtained. Another disadvantage is that these methods do not immediately provide information about which domains of the interacting proteins are involved in the interaction. Another disadvantage is that small changes in the composition of the interacting proteins cannot be tested easily for their effect on the interaction.
To avoid the disadvantages inherent in the biochemical techniques for detecting protein-protein interactions, genetic systems have recently been designed. One such system is based on transcriptional activation. Transcription is the process by which RNA molecules are synthesized using a DNA template. Transcription is regulated by specific sequences in the DNA which indicate when and where RNA synthesis should begin. These sequences correspond to binding sites for proteins, designated transcription factors, which interact with the enzymatic machinery used for the RNA polymerization reaction. There is evidence that transcription can be activated through the use of two functional domains of a transcription factor: a domain that recognizes and binds to a specific site on the DNA and a domain that is necessary for activation, as reported by Keegan, et al., Science 231, 699-704 (1986) and Ma and Ptashne, Cell, 48, 847-853 (1987). The transcriptional activation domain is thought to function by contacting other proteins involved in transcription. The DNA-binding domain appears to function to position the transcriptional activation domain on the target gene which is to be transcribed. In a few cases now known, these two functions (DNA-binding and activation) reside on separate proteins. One protein binds to the DNA, and the other protein, which activates transcriptions binds to the DNA-bound protein, as reported by McKnight et al., Proc. Nat'l Acad. Sci. USA, 89, 7061-7065 (1987); another example is reviewed by Curran et al., Cell, 55, 395-397 (1988).
Transcriptional activation has been studied using the GAL4 protein of the yeast Saccharomyces cerevisiae. The GAL4 protein is a transcriptional activator required for the expression of genes encoding enzymes of galactose utilization , see Johnston, Microbiol. Rev., 51, 458-476 (1987). It consists of an N-terminal domain which binds to specific DNA sequences designated UASG, (UAS stands for upstream activation site, G indicates the galactose genes) and a C-terminal domain containing acidic regions, which is necessary to activate transcription, see Keegan et al. (1986), supra., and Ma and Ptashne (1987), supra. As discussed by Keegan et al., the N-terminal domain binds to DNA in a sequence-specific manner but fails to activate transcription. The C-terminal domain cannot activate transcription because it fails to localize to the UASG, see for example Brent and Ptashne, Cell, 43, 729-736 (1985). However Ma and Ptashne have reported (Cell, 51, 113-119 (1987); Cell, 55, 443-446 (1988)) that when both the GAL4 N-terminal domain and the C-terminal domain are fused together in the same protein, transcriptional activity is induced. Other proteins also function as transcriptional activators via the same mechanism. For example, the GCN4 protein of Saccharomyces cerevisiae as reported by Hope and Struhl, Cell, 46, 885-894 (1986), the ADR1 protein of Saccharomyces cerevisiae as reported by Thukral et al., Molecular and Cellular Biology, 9, 2360-2369, (1989) and the human estrogen receptor, as discussed by Kumar et al., Cell, 51, 941-951 (1987) both contain separable domains for DNA binding and for maximal transcriptional activation.
Genetic systems that are capable of rapidly detecting which proteins interact with a known protein, determining which domains of the proteins interact, and providing the genes for the newly identified interacting proteins have recently been made available in Saccharomyces cerevisiae (Fields, S. and Song, O. (1989) Nature 340: 245-247, Mullinax, R. L., and Sorge, J. A. (1995) Strategies 8:3-5). These systems are useful for studying protein-protein interactions in-vivo in a eukaryotic host. To date, this has been viewed as advantageous because of the conditions in eukaryotic hosts that may provide for folding, solubility and post-translational modifications (such as phosphorylation) that may not occur in prokaryotic systems. Many eukaryotic proteins synthesized in bacteria fold incorrectly or inefficiently and, consequently, exhibit low specific activities. Production of authentic, biologically active eukaryotic proteins from cloned DNA frequently requires post-translational modifications such as accurate disulfide bond formation, glycosylation, phosphorylation, oligomerization, or specific proteolytic cleavage-processes that are not performed by bacterial cells. This problem is particularly severe when expression of functional membrane or secretory proteins such as cell surface receptors and extracellular hormones or enzymes is required. Thus, the need to develop these systems in prokaryotic screening hosts was not apparent and the advantages of such a system were not evident until recently.
With the advent of the ability to access uncultivated organisms in samples and archive the genes of these samples in cloning vectors in the form of gene libraries for eventual screening for bioactive molecules, the need to utilize systems that allow for the screening of very large numbers of clones has rapidly surfaced. Effective screening of these gene libraries requires systems that provide high transformation efficiencies where one can access the millions of clones representing these samples to screen. Eukaryotic systems such as those described are unfortunately plagued with lower transformation efficiencies. The ability to work in a prokaryotic host is advantageous. Hence, a major advantage of working with prokaryotic hosts, such as bacteria, lies in the high transformation efficiencies afforded by the utilization of these hosts for screening. Furthermore, in working with the eukaryotic hosts described above, it is critical that proteins are targeted to the nucleus, since the interaction has to take place in the nucleus.
Recently, a genetic system to detect protein-protein interactions in vivo using transcriptional repression as an assay in E.coli has been described. Genes encoding two interacting proteins are fused to a wild type and a mutant LexA DNA binding domain (the mutant is a truncated LexA protein devoid of its own oligomerization domain and is termed LexA408). LexA is an efficient transcriptional repressor in E.coli only if it acts as a dimer. This property is used to exchange the LexA dimerization domain by heterologous interacting motifs to recover repression. The non-covalent interaction between the hybrid proteins is probed by their capacity to restore the repressor activity of truncated LexA proteins (LexA408).
The interaction or association of the fused proteins is specifically measured using a reporter gene controlled by a hybrid sulA operator containing a wild type half-site and a mutated half-site (op408/op+) in a reporter strain (SU202). The lacZ reporter gene is under control of the op408/op+ hybrid operator using the sula promoter, the most tightly repressed naturally occurring SOS promoter. Upon co-expression of interacting fusion proteins, lacZ is repressed. A Lac+ phenotype yields red colonies with the system, and a Lac− phenotype yields white colonies.
Protein fusions have also been used to detect and characterize protein-protein interactions in E.coli using the phage repressor (Hu J. C. et al., Science 250, 1400-1403 (1990)). The NH-terminal DNA-binding domain of bacteriophage repressor dimerizes inefficiently and requires a separate COOH-terminal dimerization domain to bind strongly to its operator. This property allows one to evaluate the interaction between hybrid proteins generated utilizing the binding domain and the dimerization domain by their capacity to restore the repressor activity of the repressor.
In addition to protein-protein interactions, the study of the interaction of other molecules, and the ability to affect this interaction, is of interest in research and discovery processes and in the discovery of new drugs, for instance, steroids and their receptors, or polysaccharides and their receptors.