The present invention relates generally to the identification of new bioactive molecules and particularly to methods for recovering such molecules by co-encapsulation and fluorescence activated cell sorting (FACS).
There is a critical need in the chemical industry for efficient catalysts for the practical synthesis of optically pure materials; enzymes can provide the optimal solution. All classes of molecules and compounds that are utilized in both established and emerging chemical, pharmaceutical, textile, food and feed, detergent markets must meet stringent economical and environmental standards. The synthesis of polymers, pharmaceuticals, natural products and agrochemicals is often hampered by expensive processes which produce harmful byproducts and which suffer from low enantioselectivity (Faber, 1995; Tonkovich and Gerber, U.S. Dept of Energy study, 1995). Enzymes have a number of remarkable advantages which can overcome these problems in catalysis: they act on single functional groups, they distinguish between similar functional groups on a single molecule, and they distinguish between enantiomers. Moreover, they are biodegradable and function at very low mole fractions in reaction mixtures. Because of their chemo-, regio- and stereospecificity, enzymes present a unique opportunity to optimally achieve desired selective transformations. These are often extremely difficult to duplicate chemically, especially in single-step reactions. The elimination of the need for protection groups, selectivity, the ability to carry out multi-step transformations in a single reaction vessel, along with the concomitant reduction in environmental burden, has led to the increased demand for enzymes in chemical and pharmaceutical industries (Faber, 1995). Enzyme-based processes have been gradually replacing many conventional chemical-based methods (Wrotnowski, 1997). A current limitation to more widespread industrial use is primarily due to the relatively small number of commercially available enzymes. Only xcx9c300 enzymes (excluding DNA modifying enzymes) are at present commercially available from the  greater than 3000 non DNA-modifying enzyme activities thus far described.
The use of enzymes for technological applications also may require performance under demanding industrial conditions. This includes activities in environments or on substrates for which the currently known arsenal of enzymes was not evolutionarily selected. Enzymes have evolved by selective pressure to perform very specific biological functions within the milieu of a living organism, under conditions of mild temperature, pH and salt concentration. For the most part, the non-DNA modifying enzyme activities thus far described (Enzyme Nomenclature, 1992) have been isolated from mesophilic organisms, which represent a very small fraction of the available phylogenetic diversity (Amann et al., 1995). The dynamic field of biocatalysis takes on a new dimension with the help of enzymes isolated from microorganisms that thrive in extreme environments. Such enzymes must function at temperatures above 100xc2x0 C. in terrestrial hot springs and deep sea thermal vents, at temperatures below 0xc2x0 C. in arctic waters, in the saturated salt environment of the Dead Sea, at pH values around 0 in coal deposits and geothermal sulfur-rich springs, or at pH values greater than 11 in sewage sludge (Adams and Kelly, 1995). Enzymes obtained from these extremophilic organisms open a new field in biocatalysis.
For example, several esterases and lipases cloned and expressed from extremophilic organisms are remarkably robust, showing high activity throughout a wide range of temperatures and pHs. The fingerprints of five of these esterases show a diverse substrate spectrum, in addition to differences in the optimum reaction temperature. As seen in FIG. 1, esterase #5 recognizes only short chain substrates while #2 only acts on long chain substrates in addition to a huge difference in the optimal reaction temperature. These results suggest that more diverse enzymes fulfilling the need for new biocatalysts can be found by screening biodiversity. Substrates upon which enzymes act are herein defined as bioactive substrates.
Furthermore, virtually all of the enzymes known so far have come from cultured organisms, mostly bacteria and more recently archaea (Enzyme Nomenclature, 1992). Traditional enzyme discovery programs rely solely on cultured microorganisms for their screening programs and are thus only accessing a small fraction of natural diversity. Several recent studies have estimated that only a small percentage, conservatively less than 1%, of organisms present in the natural environment have been cultured (see Table 1, Amann et al., 1995, Barns et. al 1994, Torvsik, 1990). For example, Norman Pace""s laboratory recently reported intensive untapped diversity in water and sediment samples from the xe2x80x9cObsidian Poolxe2x80x9d in Yellowstone National Park, a spring which has been studied since the early 1960""s by microbiologists (Barns, 1994). Amplification and cloning of 16S rRNA encoding sequences revealed mostly unique sequences with little or no representation of the organisms which had previously been cultured from this pool. This suggests substantial diversity of archaea with so far unknown morphological, physiological and biochemical features which may be useful in industrial processes. David Ward""s laboratory in Bozmen, Mont. has performed similar studies on the cyanobacterial mat of Octopus Spring in Yellowstone Park and came to the same conclusion, namely, tremendous uncultured diversity exists (Bateson et al., 1989). Giovannoni et al. (1990) reported similar results using bacterioplankton collected in the Sargasso Sea while Torsvik et al. (1990) have shown by DNA reassociation kinetics that there is considerable diversity in soil samples. Hence, this vast majority of microorganisms represents an untapped resource for the discovery of novel biocatalysts. In order to access this potential catalytic diversity, recombinant screening approaches are required.
The discovery of novel bioactive molecules other than enzymes is also afforded by the present invention. For instance, antibiotics, antivirals, antitumor agents and regulatory proteins can be discovered utilizing the present invention.
Bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are involved in related processes. The genes are clustered, in structures referred to as xe2x80x9cgene clusters,xe2x80x9d on a single chromosome and are transcribed together under the control of a single regulatory sequence, including a single promoter which initiates transcription of the entire cluster. The gene cluster, the promoter, and additional sequences that function in regulation altogether are referred to as an xe2x80x9coperonxe2x80x9d and can include up to 20 or more genes, usually from 2 to 6 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually as to their function.
Some gene families consist of one or more identical members. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication is generated of adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Sometimes no significance is discernable in a repetition of a particular gene. A principal example of this is the expressed duplicate insulin genes in some species, whereas a single insulin gene is adequate in other mammalian species.
It is important to further research gene clusters and the extent to which the full length of the cluster is necessary for the expression of the proteins resulting therefrom. Gene clusters undergo continual reorganization and, thus, the ability to create heterogeneous libraries of gene clusters from, for example, bacterial or other prokaryote sources is valuable in determining sources of novel proteins, particularly including enzymes such as, for example, the polyketide synthases that are responsible for the synthesis of polyketides having a vast array of useful activities. As indicated, other types of proteins that are the product(s) of gene clusters are also contemplated, including, for example, antibiotics, antivirals, antitumor agents and regulatory proteins, such as insulin.
Polyketides are molecules which are an extremely rich source of bioactivities, including antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many polyketides (produced by polyketide synthases) are valuable as therapeutic agents. Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a huge variety of carbon chains differing in length and patterns of functionality and cyclization. Polyketide synthases genes fall into gene clusters and at least one type (designated type I) of polyketide synthases have large size genes and encoded enzymes, complicating genetic manipulation and in vitro studies of these genes/proteins. The method(s) of the present invention facilitate the rapid discovery of these gene clusters in gene expression libraries.
Of particular interest are cellular xe2x80x9cswitchesxe2x80x9d known as receptors which interact with a variety of biomolecules, such as hormones, growth factors, and neurotransmitters, to mediate the transduction of an xe2x80x9cexternalxe2x80x9d cellular signaling event into an xe2x80x9cinternalxe2x80x9d cellular signal. External signaling events include the binding of a ligand to the receptor, and internal events include the modulation of a pathway in the cytoplasm or nucleus involved in the growth, metabolism or apoptosis of the cell. Internal events also include the inhibition or activation of transcription of certain nucleic acid sequences, resulting in the increase or decrease in the production or presence of certain molecules (such as nucleic acid, proteins, and/or other molecules affected by this increase or decrease in transcription). Drugs to cure disease or alleviate its symptoms can activate or block any of these events to achieve a desired pharmaceutical effect.
Transduction can be accomplished by a transducing protein in the cell membrane which is activated upon an allosteric change the receptor may undergo upon binding to a specific biomolecule. The xe2x80x9cactivexe2x80x9d transducing protein activates production of so-called xe2x80x9csecond messengerxe2x80x9d molecules within the cell, which then activate certain regulatory proteins within the cell that regulate gene expression or alter some metabolic process. Variations on the theme of this xe2x80x9ccascadexe2x80x9d of events occur. For example, a receptor may act as its own transducing protein, or a transducing protein may act directly on an intracellular target without mediation by a second messenger.
Signal transduction is a fundamental area of inquiry in biology. For instance, ligand/receptor interactions and the receptor/effector coupling mediated by Guanine nucleotide-binding proteins (G-proteins) are of interest in the study of disease. A large number of G protein-linked receptors funnel extracellular signals as diverse as hormones, growth factors, neurotransmitters, primary sensory stimuli, and other signals through a set of G proteins to a small number of second-messenger systems. The G proteins act as molecular switches with an xe2x80x9conxe2x80x9d and xe2x80x9coffxe2x80x9d state governed by a GTPase cycle. Mutations in G proteins may result in either constitutive activation or loss of expression mutations.
Many receptors convey messages through heterotrimeric G proteins, of which at least 17 distinct forms have been isolated. Additionally, there are several different G protein-dependent effectors. The signals transduced through the heterotrimeric G proteins in mammalian cells influence intracellular events through the action of effector molecules.
Given the variety of functions subserved by G protein-coupled signal transduction, it is not surprising that abnormalities in G protein-coupled pathways can lead to diseases with manifestations as dissimilar as blindness, hormone resistance, precocious puberty and neoplasia. G-protein-coupled receptors are extremely important to drug research efforts. It is estimated that up to 60% of today""s prescription drugs work by somehow interacting with G protein-coupled receptors. However, these drugs were developed using classical medicinal chemistry and without a knowledge of the molecular mechanism of action. A more efficient drug discovery program could be deployed by targeting individual receptors and making use of information on gene sequence and biological function to develop effective therapeutics. The present invention allows one to, for example, study molecules which affect the interaction of G proteins with receptors, or of ligands with receptors.
Several groups have reported cells which express mammalian G proteins or subunits thereof, along with mammalian receptors which interact with these molecules. For example, WO92/05244 (Apr. 2, 1992) describes a transformed yeast cell which is incapable of producing a yeast G protein xcex1 subunit, but which has been engineered to produce both a mammalian G protein xcex1 subunit and a mammalian receptor which interacts with the subunit. The authors found that a modified version of a specific mammalian receptor integrated into the membrane of the cell, as shown by studies of the ability of isolated membranes to interact properly with various known agonists and antagonists of the receptor. Ligand binding resulted in G protein-mediated signal transduction.
Another group has described the functional expression of a mammalian adenylyl cyclase in yeast, and the use of the engineered yeast cells in identifying potential inhibitors or activators of the mammalian adenylyl cyclase (WO 95/30012). Adenylyl cyclase is among the best studied of the effector molecules which function in mammalian cells in response to activated G proteins. xe2x80x9cActivatorsxe2x80x9d of adenylyl cyclase cause the enzyme to become more active, elevating the cAMP signal of the yeast cell to a detectable degree. xe2x80x9cInhibitorsxe2x80x9d cause the cyclase to become less active, reducing the cAMP signal to a detectable degree. The method describes the use of the engineered yeast cells to screen for drugs which activate or inhibit adenylyl cyclase by their action on G protein-coupled receptors.
When attempting to identify genes encoding bioactivities of interest from complex environmental expression libraries, the rate limiting steps in discovery occur at the both DNA cloning level and at the screening level. Screening of complex environmental libraries which contain, for example, 100""s of different organisms requires the analysis of several million clones to cover this genomic diversity. An extremely high-throughput screening method has been developed to handle the enormous numbers of clones present in these libraries.
In traditional flow cytometry, it is common to analyze very large numbers of eukaryotic cells in a short period of time. Newly developed flow cytometers can analyze and sort up to 20,000 cells per second. In a typical flow cytometer, individual particles pass through an illumination zone and appropriate detectors, gated electronically, measure the magnitude of a pulse representing the extent of light scattered. The magnitude of these pulses are sorted electronically into xe2x80x9cbinsxe2x80x9d or xe2x80x9cchannelsxe2x80x9d, permitting the display of histograms of the number of cells possessing a certain quantitative property versus the channel number (Davey and Kell, 1996). It was recognized early on that the data accruing from flow cytometric measurements could be analyzed (electronically) rapidly enough that electronic cell-sorting procedures could be used to sort cells with desired properties into separate xe2x80x9cbucketsxe2x80x9d, a procedure usually known as fluorescence-activated cell sorting (Davey and Kell, 1996).
Fluorescence-activated cell sorting has been primarily used in studies of human and animal cell lines and the control of cell culture processes. Fluorophore labeling of cells and measurement of the fluorescence can give quantitative data about specific target molecules or subcellular components and their distribution in the cell population. Flow cytometry can quantitate virtually any cell-associated property or cell organelle for which there is a fluorescent probe (or natural fluorescence). The parameters which can be measured have previously been of particular interest in animal cell culture.
Flow cytometry has also been used in cloning and selection of variants from existing cell clones. This selection, however, has required stains that diffuse through cells passively, rapidly and irreversibly, with no toxic effects or other influences on metabolic or physiological processes. Since, typically, flow sorting has been used to study animal cell culture performance, physiological state of cells, and the cell cycle, one goal of cell sorting has been to keep the cells viable during and after sorting.
There currently are no reports in the literature of screening and discovery of recombinant enzymes in E. coli expression libraries by fluorescence activated cell sorting of single cells. Furthermore there are no reports of recovering DNA encoding bioactivities screened by expression screening in E. coli using a FACS machine. The present invention provides these methods to allow the extremely rapid screening of viable or non-viable cells to recover desirable activities and the nucleic acid encoding those activities.
A limited number of papers describing various applications of flow cytometry in the field of microbiology and sorting of fluorescence activated microorganisms have, however, been published (Davey and Kell, 1996). Fluorescence and other forms of staining have been employed for microbial discrimination and identification, and in the analysis of the interaction of drugs and antibiotics with microbial cells. Flow cytometry has been used in aquatic biology, where autofluorescence of photosynthetic pigments are used in the identification of algae or DNA stains are used to quantify and count marine populations (Davey and Kell, 1996). Thus, Diaper and Edwards used flow cytometry to detect viable bacteria after staining with a range of fluorogenic esters including fluorescein diacetate (FDA) derivatives and CemChrome B, a proprietary stain sold commercially for the detection of viable bacteria in suspension (Diaper and Edwards, 1994). Labeled antibodies and oligonucleotide probes have also been used for these purposes.
Papers have also been published describing the application of flow cytometry to the detection of native and recombinant enzymatic activities in eukaryotes. Betz et al. studied native (non-recombinant) lipase production by the eukaryote, Rhizopus arrhizus with flow cytometry. They found that spore suspensions of the mold were heterogeneous as judged by light-scattering data obtained with excitation at 633 nm, and they sorted clones of the subpopulations into the wells of microtiter plates. After germination and growth, lipase production was automatically assayed (turbidimetrically) in the microtiter plates, and a representative set of the most active were reisolated, cultured, and assayed conventionally (Betz et al., 1984).
Scrienc et al. have reported a flow cytometric method for detecting cloned xcex2-galactosidase activity in the eukaryotic organism, S. cerevisiae. The ability of flow cytometry to make measurements on single cells means that individual cells with high levels of expression (e.g., due to gene amplification or higher plasmid copy number) could be detected. In the method reported, a non-fluorescent compound xcex2-naphthol-xcex2-galactopyranoside) is cleaved by xcex2-galactosidase and the liberated naphthol is trapped to form an insoluble fluorescent product. The insolubility of the fluorescent product is of great importance here to prevent its diffusion from the cell. Such diffusion would not only lead to an underestimation of xcex2-galactosidase activity in highly active cells but could also lead to an overestimation of enzyme activity in inactive cells or those with low activity, as they may take up the leaked fluorescent compound, thus reducing the apparent heterogeneity of the population.
One group has described the use of a FACS machine in an assay detecting fusion proteins expressed from a specialized transducing bacteriophage in the prokaryote Bacillus subtilis (Chung, et.al., J. of Bacteriology, April 1994, p. 1977-1984; Chung, et.al., Biotechnology and Bioengineering, Vol. 47, pp. 234-242 (1995)). This group monitored the expression of a lacZ gene (encodes xcex2-galactosidase) fused to the sporulation loci in subtilis (spo). The technique used to monitor xcex2-galactosidase expression from spo-lacZ fusions in single cells involved taking samples from a sporulating culture, staining them with a commercially available fluorogenic substrate for xcex2-galactosidase called C8-FDG, and quantitatively analyzing fluorescence in single cells by flow cytometry. In this study, the flow cytometer was used as a detector to screen for the presence of the spo gene during the development of the cells. The device was not used to screen and recover positive cells from a gene expression library or nucleic acid for the purpose of discovery.
Another group has utilized flow cytometry to distinguish between the developmental stages of the delta-proteobacteria Myxococcus xanthus (F. Russo-Marie, et.al., PNAS, Vol. 90, pp.8194-8198, September 1993). As in the previously described study, this study employed the capabilities of the FACS machine to detect and distinguish genotypically identical cells in different development regulatory states. The screening of an enzymatic activity was used in this study as an indirect measure of developmental changes.
The lacZ gene from E.coli is often used as a reporter gene in studies of gene expression regulation, such as those to determine promoter efficiency, the effects of trans-acting factors, and the effects of other regulatory elements in bacterial, yeast, and animal cells. Using a chromogenic substrate, such as ONPG (o-nitrophenyl-xcex2-D-galactopyranoside), one can measure expression of xcex2-galactosidase in cell cultures; but it is not possible to monitor expression in individual cells and to analyze the heterogeneity of expression in cell populations. The use of fluorogenic substrates, however, makes it possible to determine xcex2-galactosidase activity in a large number of individual cells by means of flow cytometry. This type of determination can be more informative with regard to the physiology of the cells, since gene expression can be correlated with the stage in the mitotic cycle or the viability under certain conditions. In 1994, Plovins et al., reported the use of fluorescein-Di-xcex2-D-galactopyranoside (FDG) and C12-FDG as substrates for xcex2-galactosidase detection in animal, bacterial, and yeast cells. This study compared the two molecules as substrates for xcex2-galactosidase, and concluded that FDG is a better substrate for xcex2-galactosidase detection by flow cytometry in bacterial cells. The screening performed in this study was for the comparison of the two substrates. The detection capabilities of a FACS machine were employed to perform the study on viable bacterial cells.
Cells with chromogenic or fluorogenic substrates yield colored and fluorescent products, respectively. Previously, it had been thought that the flow cytometry-fluorescence activated cell sorter approaches could be of benefit only for the analysis of cells that contain intracellularly, or are normally physically associated with, the enzymatic activity of small molecule of interest. On this basis, one could only use fluorogenic reagents which could penetrate the cell and which are thus potentially cytotoxic. To avoid clumping of heterogeneous cells, it is desirable in flow cytometry to analyze only individual cells, and this could limit the sensitivity and therefore the concentration of target molecules that can be sensed. Weaver and his colleagues at MIT and others have developed the use of gel microdroplets containing (physically) single cells which can take up nutrients, secret products, and grow to form colonies. The diffusional properties of gel microdroplets may be made such that sufficient extracellular product remains associated with each individual gel microdroplet, so as to permit flow cytometric analysis and cell sorting on the basis of concentration of secreted molecule within each microdroplet. Beads have also been used to isolate mutants growing at different rates, and to analyze antibody secretion by hybridoma cells and the nutrient sensitivity of hybridoma cells. The gel microdroplet method has also been applied to the rapid analysis of mycobacterial growth and its inhibition by antibiotics.
The gel microdroplet technology has had significance in amplifying the signals available in flow cytometric analysis, and in permitting the screening of microbial strains in strain improvement programs for biotechnology. Wittrup et al., (Biotechnolo.Bioeng. (1993) 42:351-356) developed a microencapsulation selection method which allows the rapid and quantitative screening of  greater than 106 yeast cells for enhanced secretion of Aspergillus awamori glucoamylase. The method provides a 400-fold single-pass enrichment for high-secretion mutants.
Gel microdroplet or other related technologies can be used in the present invention to localize as well as amplify signals in the high throughput screening of recombinant libraries. Cell viability during the screening is not an issue or concern since nucleic acid can be recovered from the microdroplet.
Different types of encapsulation strategies and compounds or polymers can be used with the present invention. For instance, high temperature agaroses can be employed for making microdroplets stable at high temperatures, allowing stable encapsulation of cells subsequent to heat kill steps utilized to remove all background activities when screening for thermostable bioactivities.
There are several hurdles which must be overcome when attempting to detect and sort E. coli expressing recombinant enzymes, and recover encoding nucleic acids. FACS systems have typically been based on eukaryotic separations and have not been refined to accurately sort single E. coli cells; the low forward and sideward scatter of small particles like E. coli, reduces the ability of accurate sorting; enzyme substrates typically used in automated screening approaches, such as umbelifferyl based substrates, diffuse out of E. coli at rates which interfere with quantitation. Further, recovery of very small amounts of DNA from sorted organisms can be problematic. The present invention addresses and overcomes these hurdles and offers a novel screening approach.
The present invention adapts traditional eukaryotic flow cytometry cell sorting systems to high throughput screening for expression clones in prokaryotes. In the present invention, expression libraries derived from DNA, primarily DNA directly isolated from the environment, are screened very rapidly for bioactivities of interest utilizing fluorescense activated cell sorting. These libraries can contain greater than 108 members and can represent single organisms or can represent the genomes of over 100 different microorganisms, species or subspecies.
Accordingly, in one aspect, the present invention provides a process for identifying clones having a specified activity of interest, which process comprises (i) generating one or more expression libraries derived from nucleic acid directly isolated from the environment; and (ii) screening said libraries utilizing a high throughput cell analyzer, preferably a fluorescence activated cell sorter, to identify said clones.
More particularly, the invention provides a process for identifying clones having a specified activity of interest by (i) generating one or more expression libraries made to contain nucleic acid directly or indirectly isolated from the environment; (ii) exposing said libraries to a particular substrate or substrates of interest; and (iii) screening said exposed libraries utilizing a high throughput cell analyzer, preferably a fluorescence activated cell sorter, to identify clones which react with the substrate or substrates.
In another aspect, the invention also provides a process for identifying clones having a specified activity of interest by (i) generating one or more expression libraries derived from nucleic acid directly or indirectly isolated from the environment; and (ii) screening said exposed libraries utilizing an assay requiring a binding event or the covalent modification of a target, and a high throughput cell analyzer, preferably a fluorescence activated cell sorter, to identify positive clones.
The invention further provides a method of screening for an agent that modulates the activity of a target protein or other cell component (e.g., nucleic acid), wherein the target and a selectable marker are expressed by a recombinant cell, by co-encapsulating the agent in a micro-environment with the recombinant cell expressing the target and detectable marker and detecting the effect of the agent on the activity of the target cell component.
In another embodiment, the invention provides a method for enriching for target DNA sequences containing at least a partial coding region for at least one specified activity in a DNA sample by co-encapsulating a mixture of target DNA obtained from a mixture of organisms with a mixture of DNA probes including a detectable marker and at least a portion of a DNA sequence encoding at least one enzyme having a specified enzyme activity; incubating the co-encapsulated mixture under such conditions and for such time as to allow hybridization of complementary sequences and screening for the target DNA. Optionally the method further comprises transforming host cells with recovered target DNA to produce an expression library of a plurality of clones.
The invention further provides a method of screening for an agent that modulates the interaction of a first test protein linked to a DNA binding moiety and a second test protein linked to a transcriptional activation moiety by co-encapsulating the agent with the first test protein and second test protein in a suitable microenvironment and determining the ability of the agent to modulate the interaction of the first test protein linked to a DNA binding moiety with the second test protein covalently linked to a transcriptional activation moiety, wherein the agent enhances or inhibits the expression of a detectable protein. Preferably, screening is by FACS analysis.