This invention relates to the field of protein engineering. More particularly, the invention relates to the directed mutagenesis of DNA and screening of clones containing the mutagenized DNA for resultant specified protein, particularly enzyme, activity(ies) of interest.
In one aspect the invention provides a process for obtaining an enzyme having a specified enzyme activity derived from a heterogeneous DNA population, which process comprises: screening, for the specified enzyme activity, a library of clones containing DNA from the heterogeneous DNA population which have been exposed to directed mutagenesis towards production of the specified enzyme activity.
Another aspect of the invention provides a process for obtaining an enzyme having a specified enzyme activity, which process comprises: screening, for the specified enzyme activity, a library of clones containing DNA from a pool of DNA populations which have been exposed to directed mutagenesis in an attempt to produce in the library of clones DNA encoding an enzyme having one or more desired characteristics, which can be the same or different from the specified enzyme activity. In a preferred embodiment, the DNA pool which is subjected to directed mutagenesis is a pool of DNA which has been selected to encode enzymes having at least one enzyme characteristic, in particular at least one common enzyme activity.
Also provided is a process for obtaining a protein having a specified activity derived from a heterogeneous population of gene clusters by screening, for the specified protein activity, a library of clones containing gene clusters from the heterogeneous gene cluster population which have been exposed to directed mutagenesis towards production of specified protein activities of interest.
Also provided is a process of obtaining a gene cluster protein product having a specified activity, by screening, for the specified protein activity, a library of clones containing gene clusters from a pool of gene cluster populations which have been exposed to direct mutagenesis to produce in the library of clones gene clusters encoding proteins having one or more desired characteristics, which can be the same or different from the specified protein activity. Preferably, the pool of gene clusters which is subjected to directed mutagenesis is one which has been selected to encode proteins having enzymatic activity in the synthesis of at least one therapeutic, prophylactic or physiological regulatory activity.
The process of either of these aspects can further comprise, prior to the directed mutagenesis, selectively recovering from the heterogeneous population of gene clusters, gene clusters which comprise polycistronic sequences coding for proteins having at least one common physical, chemical or functional characteristic which can be the same or different from the activity observed prior to directed mutagenesis. Preferably, recovering the gene cluster preparation comprises contacting the gene cluster population with a specific binding partner, such as a solid phase-bound hybridization probe, for at least a portion of the gene cluster of interest. The common characteristic of the resultant protein(s) can be classes of the types of activity specified above, i.e., such as a series of enzymes related as parts of a common synthesis pathway or proteins capable of hormonal, signal transduction or inhibition of metabolic pathways or their functions in pathogens and the like. The gene cluster DNA is recovered from clones containing such gene cluster DNA from the heterogeneous gene cluster population which exhibit the activity of interest. Preferably, the directed mutagenesis is site-specific directed mutagenesis. This process can further include a step of pre-screening the library of clones for an activity, which can be the same or different from the specified activity of interest, prior to exposing them to directed mutagenesis. This activity can result, for example, from the expression of a protein or related family of proteins of interest.
The process of any of these aspects can further comprise, prior to said directed mutagenesis, selectively recovering from the heterogeneous DNA population DNA which comprises DNA sequences coding for enzymes having at least one common characteristic, which can be the same or different from the specified enzyme activity. Preferably, recovering the DNA preparation comprises contacting the DNA population with a specific binding partner, such as a solid phase bound hybridization probe, for at least a portion of the coding sequences. The common characteristic can be, for example, a class of enzyme activity, such as hydrolase activity. DNA is recovered from clones containing DNA from the heterogeneous DNA population which exhibit the class of enzyme activity. Preferably, the directed mutagenesis is site-specific directed mutagenesis. The process of this aspect can further include a step of prescreening the library of clones for an activity, which can be the same or different from the specified enzyme activity, prior to exposing them to directed mutagenesis. This activity can result, for example, from the expression of a protein of interest.
The heterogeneous DNA population from which the DNA library is derived is a complex mixture of DNA, such as is obtained, for example, from an environmental sample. Such samples can contain unculturable or uncultured multiple or single organisms. These environmental samples can be obtained from, for example, Arctic and Antarctic ice, water or permafrost sources, materials of volcanic origin, materials from soil or plant sources in tropical areas, etc. A variety of known techniques can be applied to enrich the environmental sample for organisms of interest, including differential culturing, sedimentation gradient, affinity matrices, capillary electrophoresis, optical tweezers and fluorescence activated cell sorting. The samples can also be cultures of a single organism.
The microorganisms from which the libraries may be prepared include prokaryotic microorganisms, such as Eubacteria and Archaebacteria, and lower eukaryotic microorganisms such as fungi, some algae and protozoa. The microorganisms are uncultured microorganisms obtained from environmental samples and such microorganisms may be extremophiles, such as thermophiles, hyperthermophiles, psychrophiles, psychrotrophs, etc.
Bacteria and many eukaryotes have a coordinated mechanism for regulating genes whose products are involved in related processes. The genes are clustered, in structures referred to as xe2x80x9cgene clusters,xe2x80x9d on a single chromosome and are transcribed together under the control of a single regulatory sequence, including a single promoter which initiates transcription of the entire cluster. The gene cluster, the promoter, and additional sequences that function in regulation altogether are referred to as an xe2x80x9coperonxe2x80x9d and can include up to 20 or more genes, usually from 2 to 6 genes. Thus, a gene cluster is a group of adjacent genes that are either identical or related, usually as to their function.
Some gene families consist of identical members. Clustering is a prerequisite for maintaining identity between genes, although clustered genes are not necessarily identical. Gene clusters range from extremes where a duplication is generated to adjacent related genes to cases where hundreds of identical genes lie in a tandem array. Sometimes no significance is discernable in a repetition of a particular gene. A principal example of this is the expressed duplicate insulin genes in some species, whereas a single insulin gene is adequate in other mammalian species.
It is important to further research gene clusters and the extent to which the full length of the cluster is necessary for the expression of the proteins resulting therefrom. Further, gene clusters undergo continual reorganization and, thus, the ability to create heterogeneous libraries of gene clusters from, for example, bacterial or other prokaryote sources is valuable in determining sources of novel proteins, particularly including enzymes such as, for example, the polyketide synthases that are responsible for the synthesis of polyketides having a vast array of useful activities. Other types of proteins that are the product(s) of gene clusters are also contemplated, including, for example, antibiotics, antivirals, antitumor agents and regulatory proteins, such as insulin.
Polyketides are molecules which are an extremely rich source of bioactivities, including antibiotics (such as tetracyclines and erythromycin), anti-cancer agents (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products (monensin). Many polyketides (produce by polyketide synthases) are valuable as therapeutic agents. Polyketide synthases are multifunctional enzymes that catalyze the biosynthesis of a hugh variety of carbon chains differing in length and patterns of functionality and cyclization. Polyketide synthase genes fall into gene clusters and at least one type (designated type I) of polyketide synthases have large size genes and enzymes, complicating genetic manipulation and in vitro studies of these genes/proteins.
The ability to select and combine desired components from a library of polyketides and postpolyketide biosynthesis genes for generation of novel polyketides for study is appealing. The method(s) of the present invention make it possible to and facilitate the cloning of novel polyketide synthases, since one can generate gene banks with clones containing large inserts (especially when using the f-factor based vectors), which facilitates cloning of gene clusters.
Preferably, the gene cluster DNA is ligated into a vector, particularly wherein a vector further comprises expression regulatory sequences which can control and regulate the production of a detectable protein or protein-related array activity from the ligated gene clusters. Use of vectors which have an exceptionally large capacity for exogenous DNA introduction are particularly appropriate for use with such gene clusters and are described by way of example herein to include the f-factor (or fertility factor) of E. coli. This f-factor of E. coli is a plasmid which affect high-frequency transfer of itself during conjugation and is ideal to achieve and stably propagate large DNA fragments, such as gene clusters from mixed microbial samples.
The DNA can then be isolated by available techniques that are described in the literature. The IsoQuick(copyright) nucleic acid extraction kit (MicroProbe Corporation) is suitable for this purpose.
The term xe2x80x9cderivedxe2x80x9d or xe2x80x9cisolatedxe2x80x9d means that material is removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or polypeptide separated from some or all of the coexisting materials in the natural system, is isolated.
The DNA isolated or derived from these microorganisms can preferably be inserted into a vector prior to probing for selected DNA. Such vectors are preferably those containing expression regulatory sequences, including promoters, enhancers and the like. Such polynucleotides can be part of a vector and or a composition and still be isolated, in that such vector or composition is not part of its natural environment. Particularly preferred phage or plasmid and methods for introduction and packaging into them are described in detail in the protocol set forth herein.
The following outlines a general procedure for producing gene libraries from both culturable and nonculturable organisms.
Obtain Biomass
DNA Isolation
Shear DNA (25 gauge needle)
Blunt DNA (Mung Bean Nuclease)
Methylate (EcoR I Methylase)
Ligate to EcoR I linkers (GGAATTCC)
Cut back linkers (EcoR I Restriction Endonuclease)
Size Fractionate (Sucrose Gradient)
Ligate to lambda vector (Lambda ZAP(copyright) II and gt11)
Package (in vitro lambda packaging extract)
Plate on E. coli host and amplify
Clones having an enzyme activity of interest are identified by screening. This screening can be done either by hybridization, to identify the presence of DNA coding for the enzyme of interest or by detection of the enzymatic activity of interest.
The probe DNA used for selectively recovering DNA of interest from the DNA derived from the at least one uncultured microorganism can be a full-length coding region sequence or a partial coding region sequence of DNA for an enzyme of known activity, a phylogenetic marker or other identified DNA sequence. The original DNA library can be preferably probed using mixtures of probes comprising at least a portion of the DNA sequence encoding the specified activity. These probes or probe libraries are preferably single-stranded and the microbial DNA which is probed has preferably been converted into single-stranded form. The probes that are particularly suitable are those derived from DNA encoding enzymes having an activity similar or identical to the specified enzyme activity which is to be screened.
The probe DNA should be at least about 10 bases and preferably at least 15 bases. In one embodiment, the entire coding region may be employed as a probe. Conditions for the hybridization in which DNA is selectively isolated by the use of at least one DNA probe will be designed to provide a hybridization stringency of at least about 50% sequence identity, more particularly a stringency providing for a sequence identity of at least about 75%.
Hybridization techniques for probing a microbial DNA library to isolate DNA of potential interest are well known in the art and any of those which are described in the literature are suitable for use herein, particularly those which use a solid phase-bound, directly or indirectly bound, probe DNA for ease in separation from the remainder of the DNA derived from the microorganisms.
Preferably the probe DNA is xe2x80x9clabeledxe2x80x9d with one partner of a specific binding pair (i.e. a ligand) and the other partner of the pair is bound to a solid matrix to provide ease of separation of target from its source. The ligand and specific binding partner can be selected from, in either orientation, the following: (1) an antigen or hapten and an antibody or specific binding fragment thereof; (2) biotin or iminobiotin and avidin or streptavidin; (3) a sugar and a lectin specific therefor; (4) an enzyme and an inhibitor therefor; (5) an apoenzyme and cofactor; (6) complementary homopolymeric oligonucleotides; and (7) a hormone and a receptor therefor. The solid phase is preferably selected from: (1) a glass or polymeric surface; (2) a packed column of polymeric beads; and (3) magnetic or paramagnetic particles.
The library of clones prepared as described above can be screened directly for enzymatic activity without the need for culture expansion, amplification or other supplementary procedures. However, in one preferred embodiment, it is considered desirable to amplify the DNA recovered from the individual clones such as by PCR.
Further, it is optional but desirable to perform an amplification of the target DNA that has been isolated. In this embodiment the selectively isolated DNA is separated from the probe DNA after isolation. It is then amplified before being used to transform hosts. The double stranded DNA selected to include as at least a portion thereof a predetermined DNA sequence can be rendered single stranded, subjected to amplification and reannealed to provide amplified numbers of selected double stranded DNA. Numerous amplification methodologies are now well known in the art.
The selected DNA is then used for preparing a library for screening by transforming a suitable organism. Hosts, particularly those specifically identified herein as preferred, are transformed by artificial introduction of the vectors containing the target DNA by inoculation under conditions conducive for such transformation.
The resultant libraries of transformed clones are then screened for clones which display activity for the enzyme of interest in a phenotypic assay for enzyme activity.
Having prepared a multiplicity of clones from DNA selectively isolated from an organism, organism, such clones are screened for a specific enzyme activity and to identify the clones having the specified enzyme characteristics.
The screening for enzyme activity may be effected on individual expression clones or may be initially effected on a mixture of expression clones to ascertain whether or not the mixture has one or more specified enzyme activities. If the mixture has a specified enzyme activity, then the individual clones may be rescreened for such enzyme activity or for a more specific activity. Thus, for example, if a clone mixture has hydrolase activity, then the individual clones may be recovered and screened to determine which of such clones has hydrolase activity.
The DNA derived from a microorganism(s) is preferably inserted into an appropriate vector (generally a vector containing suitable regulatory sequences for effecting expression) prior to subjecting such DNA to a selection procedure to select and isolate therefrom DNA which hybridizes to DNA derived from DNA encoding an enzyme(s) having the specified enzyme activity.
As representative examples of expression vectors which may be used there may be mentioned viral particles, baculovirus, phage, plasmids, phagemids, cosmids, phosmids, bacterial artificial chromosomes, viral DNA (e.g. vaccinia, adenovirus, foul pox virus, pseudorabies and derivatives of SV40), P1-based artificial chromosomes, yeast plasmids, yeast artificial chromosomes, and any other vectors specific for specific hosts of interest (such as bacillus, aspergillus, yeast, etc.) Thus, for example, the DNA may be included in any one of a variety of expression vectors for expressing a polypeptide. Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available. The following vectors are provided by way of example; Bacterial: pQE70, pQE60, pQE-9 (Qiagen), psiX174, pBluescript(copyright) SK, pBluescript(copyright) KS(Stratagene); pTRC99a, pKK223-3, pDR540, pRIT2T (Pharmacia); Eukaryotic: pWLNEO, pXT1, pSG5 (Stratagene) pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia). Any other plasmid or vector may be used as long as they are replicable and viable in the host.
Another type of vector for use in the: present invention contains an f-factor origin of replication. The f-factor (or fertility factor) in E. coli is a plasmid which effects high frequency transfer of itself during conjugation and less frequent transfer of the bacterial chromosome itself. A particularly preferred embodiment is to use cloning vectors, referred to as a xe2x80x9cfosmids,xe2x80x9d or bacterial artificial chromosome (BAC) vectors. These are derived from the E. coli f-factor which is able to stably integrate large segments of genomic DNA. When integrated with DNA from a mixed uncultured environmental sample, this makes it possible to achieve large genomic fragments in the form of a stable xe2x80x9cenvironmental DNA library.xe2x80x9d
The DNA derived from a microorganism(s) may be inserted into the vector by a variety of procedures. In general, the DNA sequence is inserted into an appropriate restriction endonuclease site(s) by procedures known in the art. Such procedures and others are deemed to be within the scope of those skilled in the art.
The DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (promoter) to direct MRNA synthesis. Particular named bacterial promoters include lacI, lacZ, T3, T7, gpt, lambda PR, PL and trp. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression. Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
In addition, the expression vectors preferably contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli. 
Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), xcex1-factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium.
The DNA selected and isolated as hereinabove described is introduced into a suitable host to prepare a library which is screened for the desired enzyme activity. The selected DNA is preferably already in a vector which includes appropriate control sequences whereby selected DNA which encodes for an enzyme may be expressed, for detection of the desired activity. The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a prokaryotic cell, such as a bacterial cell. Introduction of the construct into the host cell can be effected by trarsformation, calcium phosphate transfection, DEAE-Dextran mediated tansfection, or electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, (1986)).
As representative examples of appropriate hosts, there may be mentioned: bacterial cells, such as E. coli, Streptomyces, Salmonella typhimurium; fungal cells, such as yeast; insect cells such as Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS or Bowes melanoma; adenoviruses; plant cells, etc. The selection of an appropriate host is deemed to be within the scope of those skilled in the art from the teachings herein.
With particular references to various mammalian cell culture systems that can be employed to express recombinant protein, examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5xe2x80x2 flanking nontranscribed sequences. DNA sequences derived from the SV40 splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements.
Host cells are genetically engineered (transduced or transformed or transfected) with the vectors. The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying genes. The culture conditions, such as temperature,. pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan.
The library may be screened for a specified enzyme activity by procedures known in the art. For example, the enzyme activity may be screened for one or more of the six IUB classes; oxidoreductases, transferases, hydrolases, lyases, isomerases and ligases. The recombinant enzymes which are determined to be positive for one or more of the IUB classes may then be rescreened for a more specific enzyme activity.
Alternatively, the library may be screened for a more specialized enzyme activity. For example, instead of generically screening for hydrolase activity, the library may be screened for a more specialized activity, i.e. the type of bond on which the hydrolase acts. Thus, for example, the library may be screened to ascertain those hydrolases which act on one or more specified chemical functionalities, such as: (a) amide (peptide bonds), i.e. proteases; (b) ester bonds, i.e. esterases and lipases; (c) acetals, i.e., glycosidases etc.
Clones found to have the enzymatic activity for which the screen was performed are sequenced and then subjected to directed mutagenesis to develop new enzymes with desired activities or to develop modified enzymes with particularly desired properties that are absent or less pronounced in the wild-type enzyme, such as stability to heat or organic solvents. Any of the known techniques for directed mutagenesis are applicable to the invention. For example, particularly preferred mutagenesis techniques for use in accordance with the invention include those discussed below.
The term xe2x80x9cerror-prone PCRxe2x80x9d refers to a process for performing PCR under conditions where the copying fidelity of the DNA polymerase is low, such that a high rate of point mutations is obtained along the entire length of the PCR product. Leung, D. W., et al., Technique, 1:11-15 (1989) and Caldwell, R. C. and Joyce G. F., PCR Methods Applic., 2:28-33 (1992).
The term xe2x80x9coligonucleotide directed mutagenesisxe2x80x9d refers to a process which allows for the generation of site-specific mutations in any cloned DNA segment of interest. Reidhaar-Olson, J. F. and Sauer, R. T., et al., Science, 241:53-57 (1988).
The term xe2x80x9cassembly PCRxe2x80x9d refers to a process which involves the assembly of a PCR product from a mixture of small DNA fragments. A large number of different PCR reactions occur in parallel in the same vial, with the products of one reaction priming the products of another reaction.
The term xe2x80x9csexual PCR mutagenesisxe2x80x9d refers to forced homologous recombination between DNA molecules of different but highly related DNA sequence in vitro, caused by random fragmentation of the DNA molecule based on sequence homology, followed by fixation of the crossover by primer extension in a PCR reaction. Stemmer, W. P., PNAS, USA, 91:10747-10751 (1994).
The term xe2x80x9cin vivo mutagenesisxe2x80x9d refers to a process of generating random mutations in any cloned DNA of interest which involves the propagation of the DNA in a strain of E coli that carries mutations in one or more of the DNA repair pathways. These xe2x80x9cmutatorxe2x80x9d strains have a higher random mutation rate than that of a wild-type parent. Propogating the DNA in one of these strains will eventually generate random mutations within the DNA.
The term xe2x80x9ccassette mutagenesisxe2x80x9d refers to any process for replacing a small region of a double stranded DNA molecule with a synthetic oligonucleotide xe2x80x9ccassettexe2x80x9d that differs from the native sequence. The oligonucleotide often contains completely and/or partially randomized native sequence.
The term xe2x80x9crecursive ensemble mutagenesisxe2x80x9d refers to an algorithm for protein engineering (protein mutagenesis) developed: to produce diverse populations of phenotypically related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control successive rounds of combinatorial cassette mutagenesis. Arkin, A. P. and Youvan, D. C., PNAS, USA, 89:7811-7815 (1992).
The term xe2x80x9cexponential ensemble mutagenesisxe2x80x9d refers to a process for generating combinatorial libraries with a high percentage of unique and functional mutants, wherein small groups of residues are randomized in parallel to identify, at each altered position, amino acids which lead to functional proteins, Delegrave, S. and Youvan, D. C., Biotechnology Research, 11:1548-1552 (1993); and random and site-directed mutagenesis, Arnold, F. H., Current Opinion in Biotechnology, 4:450-455 (1993). All of the references mentioned above are hereby incorporated by reference in their entirety.