This invention relates to biological treatment of organic compounds, and particularly to the degradation of toluene and toluene analogues.
Industrial processes that use or generate toxic organic compounds (e.g., toluene, benzene, xylenes) has lead to the contamination of nearby water and land. Such compounds are among the most water soluble of all gasoline components and can also enter aquatic environments from many sources such as gasoline underground storage tanks, leaks, and spills.
Most approaches to decontamination or xe2x80x9cremediationxe2x80x9d involve stopping the local dumping of such compounds and transport of the waste to another area for containment. This is costly and does not eliminate the hazard.
As a remediation technology, bioremediation is considerably more attractive. Rather than merely transporting wastes, it offers the possibility of degrading toxic compounds to harmless reaction products by the use of biologicals.
Bioremediation field trials have involved both in-situ and ex-situ treatment methods. Typically, ex-situ treatment involves the transfer of contaminated waste from the site into a treatment tank designed to support microbial growth, i.e., a xe2x80x9cbioreactorxe2x80x9d. The reactor provides for effective mixing of nutrients and control over temperature, pH and aeration to allow optimum microbial growth.
In-situ treatment involves adding biologicals directly to the waste. This avoids the problems associated with handling (e.g., pumping) toxic compounds. However, in-situ treatment has its own problems. Unlike bioreactors, where microbial growth can be monitored and adjusted, in-situ environmental conditions are difficult to measure and control.
Fries el al., xe2x80x9cIsolation, characterization and distribution of denitrifying toluene degraders from a variety of habitats,xe2x80x9d Appl. Environ. Microbiol. 60:2802 (1994) generally indicates that biodegradation of benzene, toluene, ethylbenzene and xylenes under aerobic conditions is well known, although the availability of oxygen due to its low solubility in water and low rate of transport in soils and sediments is rate limiting. Fries et al. describes anaerobic respiration of toluene by microorganisms isolated from nature. The microorganisms could grow on 25 ppm toluene and could be fed 50 ppm toluene.
Rates have been determined at 28-30xc2x0 C. with intact cells from a variety of strains. The rates vary from between 8 to 80 nmoles toluene minxe2x88x921 mgxe2x88x921 protein. A. Frazer et al., xe2x80x9cToluene Metabolism Under Anaerobic Conditions: A Review,xe2x80x9d Anaerobe 1:293 (1995).
There remains a need to develop a bioremediation procedure that can be operated economically on a commercial scale. Such a procedure must be able to degrade organic compounds with high efficiency.
This invention relates to biological treatment of organic compounds, and particularly to the degradation of toluene and toluene analogues. In one embodiment, the present invention contemplates a method of degrading compounds contained in a liquid or solid waste source, comprising the steps of: a) providing, i) a waste source comprising toluene (and/or a toluene analogue), ii) a reaction containing means, and iii) a compound selected from the group consisting of a functional, cell-free pyruvate formate lyase homologue of a toluene-degrading bacterium and a functional, cell-free pyruvate formate lyase activating homologue of a toluene-degrading bacterium; and b) reacting said homologue and said waste source in said containing means under conditions such that toluene (and/or the toluene analogue) is degraded.
It is not intended that the present invention be limited by the specific toluene-degrading bacterium. In one embodiment, said homologue is derived from an organism of the genus Thauera. In one embodiment, the organism is Thauera aromatica. 
In another embodiment, said homologue is derived from an organism of the genus Xanthomonas. In one embodiment, the organism is Xanthomonas maltophilia. 
In yet another embodiment, said homologue is derived from an organism of the genus Geobacter. In one embodiment, the organism is Geobacter metallireducens. 
In still another embodiment, said homologue is derived from members of the genus Azoarcus. In one embodiment, the organism is Azoarcus tolulyticus. 
The present invention contemplates nucleic acid sequences (and constructs comprising said sequences) and amino acid sequences of toluene degrading enzymes as compositions of matter (as well as antibodies to such amino acid sequences). In one embodiment, the present invention contemplates a purified nucleic acid comprising DNA having the sequence as set forth in FIGS. 12A-Y. In one embodiment, said DNA is in a vector. In another embodiment, said vector is a bacterial plasmid. In a particular embodiment, said bacterial plasmid is in a host cell. In one embodiment, said host cell expresses a toluene-degrading enzyme.
The present invention contemplates a functional, cell-free product of the tutD gene having the amino acid sequence as set forth in FIGS. 11A-D. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.
It is also not intended that the present invention be limited by the precise amino acid sequence of the homologue. In one embodiment, it is encoded by the tutD gene, a nucleic acid sequence for which is shown in FIGS. 5A-O, and has the amino acid sequence shown in FIGS. 7A-C. In another embodiment, the homologue is an expanded TutD protein having the amino acid shown in FIGS. 11A-D and the corresponding nucleic acid sequence shown in FIGS. 12A-Y. In another embodiment, the homologue is encoded by the tutE gene having a nucleic acid sequence shown in FIGS. 12A-Y, and a corresponding amino acid sequence shown in FIGS. 13A-B.
Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutD gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
In another embodiment, the present invention contemplates a reporter gene fusion product constructed by fusing the tutE gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
The present invention contemplates a functional, cell-free product of the tutH gene having the nucleic acid sequence as set forth in FIG. 18 and the amino acid sequence shown in FIG. 19. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.
Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutH gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
The present invention contemplates a functional, cell-free product of the tutI gene having the nucleic acid sequence as set forth in FIG. 21 and the amino acid sequence shown in FIG. 22. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.
Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutI gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
The present invention contemplates a functional, cell-free product of the tutF gene having the nucleic acid sequence as set forth in FIG. 24 and the amino acid sequence shown in FIG. 25. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.
Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutF gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
The present invention contemplates a functional, cell-free product of the tutG gene having the nucleic acid sequence as set forth in FIG. 26 and the amino acid sequence shown in FIG. 27. In one embodiment, said product is contained within a reaction containing means. In a preferred embodiment, said reaction containing means is a bioreactor.
Additionally, the present invention contemplates a reporter gene fusion product constructed by fusing the tutG gene in frame to a reporter such as lacZ, luxA, or green fluorescence protein. Such constructs can be used to demonstrate regulated expression in response to toluene.
Additionally, the present invention contemplates a composition comprising isolated and purified DNA having an oligonucleotide sequence selected form the group consisting of, SEQ ID NO: 43, SEQ ID NO: 45, SEQ ID NO: 47, and SEQ ID NO: 49.
Additionally, the present invention contemplates a composition comprising isolated and purified polypeptide selected form the group consisting of, SEQ ID NO: 44, SEQ ID NO: 46, SEQ ID NO: 48, and SEQ ID NO: 50.
Definitions
To facilitate understanding of the invention, a number of terms are defined below.
The term xe2x80x9creactionxe2x80x9d or xe2x80x9cchemical reactionxe2x80x9d means reactions involving chemical reactants, such as organic compounds. A xe2x80x9creaction containing meansxe2x80x9d refers to anything that can contain a reaction, including but not limited to, tubes, microtiter plates, vessels, and bioreactors. It is not intended that the present invention be limited by a particular reaction containing means. U.S. Pat. Nos. 5,610,061, 5,585,272, 5,571,705, 5,560,737, 5,057,221 and 5,037,551 all describe various reaction containing means (including bioreactors) and are hereby incorporated by reference.
xe2x80x9cInitiating a reactionxe2x80x9d means causing a reaction to take place. Reactions can be initiated by any means (e.g., mixing, heat, wavelengths of light, addition of a catalyst, etc.)
A xe2x80x9csolventxe2x80x9d is a liquid substance capable of dissolving or dispersing one or more other substances. It is not intended that the present invention be limited by the nature of the solvent used.
A xe2x80x9cwaste sourcexe2x80x9d can be a solid or liquid waste source (e.g., paper pulp, pulp mill effluent, sludge, wastewater, petroleum spill, etc.).
xe2x80x9cToluene analoguesxe2x80x9d are structural analogues of toluene. While it is not intended that the present invention be limited to particular analogues, examples include the o-, m-, and p-isomers of chlorotoluene, fluorotoluene and xylene.
A xe2x80x9cpyruvate formate lyase homologuexe2x80x9d is defined as a gene product from a toluene-degrading organism, said gene product comprising i) regions of identity with the pyruvate formate lyase from E. coli (the PflD gene Genebank G418519) and/or from Clostridium pasteurianum (Genebank G1072361) such that the gene product contains the motif RVSGY (SEQ ID NO:1), RVAGY (SEQ ID NO:2), or VRVSGYSA (SEQ ID NO:3) at the essential glycine (shown in bold and discussed below), and ii) regions of non-identity. The gene product may contain other regions of identity with pyruvate formate lyase from E. coli (the PflD gene Genebank G418519) and from Clostridium pasteurianum (Genebank G1072361), including but not limited to, the motif TPDGR (SEQ ID NO:4), TPDGRF (SEQ ID NO:5), GPTAVL (SEQ ID NO:6), and GNDDD (SEQ ID NO:7). As noted below, the present invention also identifies other conserved regions, including but not limited to those associated with an essential conserved cysteine.
A xe2x80x9cfunctionalxe2x80x9d homologue is one where transfer of the gene or expression of the gene product confers the ability to degrade toluene. Functional homologues need not comprise the entire gene product, i.e. functional peptidc fragments (portions that are less than the entire gene product) are specifically contemplated.
The term xe2x80x9cpurifiedxe2x80x9d means separated from some components that are normally present in the native state. Thus, a spectrum of purity is contemplated. At the very basic level, a cell-free preparation is xe2x80x9cpurified.xe2x80x9d Similarly, nucleic acid that is even substantially protein-free is xe2x80x9cpurified.xe2x80x9d At a more extreme level, the present invention contemplates a particular toluene degrading protein that is substantially free of all other proteins (usually less than 10% and preferably less than 5% of other proteins are present).
The term xe2x80x9cgenexe2x80x9d refers to a DNA sequence that comprises control and coding sequences necessary for the production of a polypeptide or precursor thereof. The polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired enzymatic activity is retained.
The term xe2x80x9cwild-typexe2x80x9d refers to a gene or gene product which has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the xe2x80x9cnormalxe2x80x9d or xe2x80x9cwild-typexe2x80x9d form of the gene. In contrast, the term xe2x80x9cmodifiedxe2x80x9d or xe2x80x9cmutantxe2x80x9d refers to a gene or gene product which displays modifications in sequence and or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
The term xe2x80x9coligonucleotidexe2x80x9d as used herein is defined as a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, usually more than three (3), and typically more than ten (10) and up to one hundred (100) or more (although preferably between twenty and thirty). The exact size will depend on many factors, which in turn depends on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, or a combination thereof.
Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5xe2x80x2 phosphate of one mononucleotide pentose ring is attached to the 3xe2x80x2 oxygen of its neighbor in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the xe2x80x9c5xe2x80x2 endxe2x80x9d if its 5xe2x80x2 phosphate is not linked to the 3xe2x80x2 oxygen of a mononucleotide pentose ring and as the xe2x80x9c3xe2x80x2 endxe2x80x9d if its 3xe2x80x2 oxygen is not linked to a 5xe2x80x2 phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5xe2x80x2 and 3xe2x80x2 ends.
When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, and the 3xe2x80x2 end of one oligonucleotide points towards the 5xe2x80x2 end of the other, the former may be called the xe2x80x9cupstreamxe2x80x9d oligonucleotide and the latter the xe2x80x9cdownstreamxe2x80x9d oligonucleotide.
The term xe2x80x9cprimerxe2x80x9d refers to an oligonucleotide which is capable of acting as a point of initiation of synthesis when placed under conditions in which primer extension is initiated. An oligonucleotide xe2x80x9cprimerxe2x80x9d may occur naturally, as in a purified restriction digest or may be produced synthetically.
A primer or oligonucleotide is selected to be xe2x80x9csubstantiallyxe2x80x9d complementary to a strand of specific sequence of the template. A primer must be sufficiently complementary to hybridize with a template strand for primer elongation to occur. A primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5xe2x80x2 end of the primer, with the remainder of the primer sequence being substantially complementary to the strand. Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarily with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.
xe2x80x9cHybridizationxe2x80x9d methods involve the annealing of a complementary sequence to the target nucleic acid (the sequence to be detected). The ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction is a well-recognized phenomenon. The initial observations of the xe2x80x9chybridizationxe2x80x9d process by Marmur and Lane, Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA 46:461 (1960) have been followed by the refinement of this process into an essential tool of modern biology.
Even where the sequence of a probe or oligonucleotide is completely complementary to the sequence of the target, i.e., the target""s primary structure, the target sequence must be made accessible to the probe via rearrangements of higher-order structure. These higher-order structural rearrangements may concern either the secondary structure or tertiary structure of the molecule. Secondary structure is determined by intramolecular bonding. In the case of DNA or RNA targets this consists of hybridization within a single, continuous strand of bases (as opposed to hybridization between two different strands). Depending on the extent and position of intramolecular bonding, the probe can be displaced from the target sequence preventing hybridization.
Solution hybridization of oligonucleotide probes to denatured double-stranded DNA is further complicated by the fact that the longer complementary target strands can renature or reanneal. Again, hybridized probe is displaced by this process. This results in a low yield of hybridization (low xe2x80x9ccoveragexe2x80x9d) relative to the starting concentrations of probe and target.
Hybridization, regardless of the method used, requires some degree of complementarily between the sequence being assayed (the target sequence) and the fragment of DNA used to perform the test (the probe). (Of course, one can obtain binding without any complementarily but this binding is nonspecific and to be avoided.)
The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5xe2x80x2 end of one sequence is paired with the 3xe2x80x2 end of the other, is in xe2x80x9cantiparallel association.xe2x80x9d Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarily need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.
Stability of a nucleic acid duplex is measured by the melting temperature, or xe2x80x9cTm.xe2x80x9d The Tm of a particular nucleic acid duplex under specified conditions is the temperature at which on average half of the base pairs have disassociated. The equation for calculating the Tm of nucleic acids is well known in the art. As indicated by standard references, an estimate of the Tm value may be calculated by the equation:
Tm=81.5xc2x0 C.+16.6 log M+0.41(%GC)xe2x88x920.61(% form)xe2x88x92500/L
where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, %form is the percentage of formamide in the hybridization solution, and L=length of the hybrid in base pairs. [See e.g., Guide to Molecular Cloning Techniques, Ed. S. L. Berger and A. R. Kimmel, in Methods in Enzymology Vol. 152, 401 (1987)]. Other references include more sophisticated computations which take structural as well as sequence characteristics into account for the calculation of Tm.
The present invention contemplates utilizing the nucleic acid sequence of the tutD gene to isolate other genes encoding pyruvate formate lyase homologues by hybridizing portions of the tutD gene to total DNA of various toluene-degrading organisms. Preferably, hybridization is carried out at high stringency (i.e., carried out at or near the Tm of the particular duplex). Hybridization can be used to capture other genes. Alternatively, hybridization can be followed by primer extension or PCR.
The present invention also contemplates utilizing the nucleic acid sequence of the tutE gene to isolate other genes encoding pyruvate formate lyase homologues by hybridizing portions of the tutE gene to total DNA of various toluene-degrading organisms. Preferably, hybridization is carried out at high stringency (i.e., carried out at or near the Tm of the particular duplex). Hybridization can be used to capture other genes. Alternatively, hybridization can be followed by primer extension or PCR.
Mullis, et al., U.S. Pat. Nos. 4,683,195 and 4,683,202 (both of which are hereby incorporated by reference), describe a methods for increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence consists of introducing a molar excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence. The two primers are complementary to their respective strands of the double-stranded sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization, and polymerase extension can be repeated as often as needed to obtain are relatively high concentration of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to by the inventors as the xe2x80x9cPolymerase Chain Reactionxe2x80x9d (hereinafter PCR). Because the desired segment of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be xe2x80x9cPCR-amplified.xe2x80x9d
It is not intended that the present invention be limited to a particular toluene-degrading organism. The present invention contemplates identifying homologues in both known and yet undiscovered toluene-degrading organisms. Known organisms are set forth in the Table 1.
The term xe2x80x9cprobexe2x80x9d as used herein refers to a labeled oligonucleotide which forms a duplex structure with a sequence in another nucleic acid, due to complementarily of at least one sequence in the probe with a sequence in the other nucleic acid.
The term xe2x80x9clabelxe2x80x9d as used herein refers to any atom or molecule which can be used to provide a detectable (preferably quantifiable) signal, and which can be attached to a nucleic acid or protein. Labels may provide signals detectable by fluorescence, radioactivity, colorimetry, gravimetry, X-ray diffraction or absorption, magnetism, enzymatic activity, and the like.
The terms xe2x80x9cnucleic acid substratexe2x80x9d and nucleic acid templatexe2x80x9d are used herein interchangeably and refer to a nucleic acid molecule which may comprise single- or double-stranded DNA or RNA.
The term xe2x80x9csubstantially single-strandedxe2x80x9d when used in reference to a nucleic acid substrate means that the substrate molecule exists primarily as a single strand of nucleic acid in contrast to a double-stranded substrate which exists as two strands of nucleic acid which are held together by inter-strand base pairing interactions.
The term xe2x80x9csequence variationxe2x80x9d as used herein refers to differences in nucleic acid sequence between two nucleic acid templates. For example, a wild-type structural gene and a mutant form of this wild-type structural gene may vary in sequence by the presence of single base substitutions and/or deletions or insertions of one or more nucleotides. These two forms of the structural gene are said to vary in sequence from one another. A second mutant form of the structural gene may exist. This second mutant form is said to vary in sequence from both the wild-type gene and the first mutant form of the gene. It is noted, however, that the invention does not require that a comparison be made between one or more forms of a gene to detect sequence variations.
The term xe2x80x9cKmxe2x80x9d as used herein refers to the Michaelis-Menten constant for an enzyme and is defined as the concentration of the specific substrate at which a given enzyme yields one-half its maximum velocity in an enzyme catalyzed reaction.
The term xe2x80x9cnucleotide analogxe2x80x9d as used herein refers to modified or non-naturally occurring nucleotides such as 7-deaza purines (i.e., 7-deaza-dATP and 7-deaza-dGTP). Nucleotide analogs include base analogs and comprise modified forms of deoxyribonucleotides as well as ribonucleotides. As used herein the term xe2x80x9cnucleotide analogxe2x80x9d when used in reference to substrates present in a PCR mixture refers to the use of nucleotides other than dATP, dGTP, dCTP and dTTP; thus, the use of dump (a naturally occurring dNTP) in a PCR would comprise the use of a nucleotide analog in the PCR. A PCR product generated using dump, 7-deaza-dATP, 7-deaza-dGTP or any other nucleotide analog in the reaction mixture is said to contain nucleotide analogs.
xe2x80x9cOligonucleotide primers matching or complementary to a gene sequencexe2x80x9d refers to oligonucleotide primers capable of facilitating the template-dependent synthesis of single or double-stranded nucleic acids. Oligonucleotide primers matching or complementary to a gene sequence may be used in PCRs, RT-PCRs and the like.
A xe2x80x9cconsensus gene sequencexe2x80x9d refers to a gene sequence which is derived by comparison of two or more gene sequences and which describes the nucleotides most often present in a given segment of the genes; the consensus sequence is the canonical sequence. A xe2x80x9cmotifxe2x80x9d refers to the corresponding amino acid sequence defining a region of identity following a comparison of two or more amino acid sequences.
The term xe2x80x9cpolymorphic locusxe2x80x9d is a locus present in a population which shows variation between members of the population (i.e., the most common allele has a frequency of less than 0.95). In contrast, a xe2x80x9cmonomorphic locusxe2x80x9d is a genetic locus at little or no variations seen between members of the population (generally taken to be a locus at which the most common allele exceeds a frequency of 0.95 in the gene pool of the population).
The term xe2x80x9cmicroorganismxe2x80x9d as used herein means an organism too small to be observed with the unaided eye and includes, but is not limited to bacteria, viruses, protozoans, fungi, and ciliates.
The term xe2x80x9cmicrobial gene sequencesxe2x80x9d refers to gene sequences derived from a microorganism.
The term xe2x80x9cbacteriaxe2x80x9d refers to any bacterial species including abacterial and archaebacterial species.
The term xe2x80x9crecombinant DNA moleculexe2x80x9d as used herein refers to a DNA molecule which is comprised of segments of DNA joined together by means of molecular biological techniques.
The terms xe2x80x9cin operable combinationxe2x80x9d or xe2x80x9coperably linkedxe2x80x9d as used herein refers to the linkage of nucleic acid sequences in such a manner that a nucleic acid molecule capable of directing the synthesis of a desired protein molecule is produced. When a promoter sequence is operably linked to sequences encoding a protein, the promoter directs the expression of mRNA which can be translated to produce a functional form of the encoded protein. The term also refers to the linkage of amino acid sequences in such a manner that a functional protein is produced.
The term xe2x80x9can oligonucleotide having a nucleotide sequence encoding a genexe2x80x9d means a DNA sequence comprising the coding region of a gene or, in other words, the DNA sequence which encodes a gene product. The coding region may be present in either a cDNA or genomic DNA form. Suitable control elements such as enhancers/promoters, splice junctions, polyadenylation signals, etc. may be placed in close proximity to the coding region of the gene if needed to permit proper initiation of transcription and/or correct processing of the primary RNA transcript. Alternatively, the coding region utilized in the expression vectors of the present invention may contain endogenous enhancers/promoters, splice junctions, intervening sequences, polyadenylation signals, etc. or a combination of both endogenous and exogenous control elements.
The term xe2x80x9crecombinant oligonucleotidexe2x80x9d refers to an oligonucleotide created using molecular biological manipulations, including but not limited to, the ligation of two or more oligonucleotide sequences generated by restriction enzyme digestion of a polynucleotide sequence, the synthesis of oligonucleotides (e.g., the synthesis of primers or oligonucleotides) and the like.
The term xe2x80x9crecombinant oligonucleotide having a sequence encoding a protein operably linked to a heterologous promoterxe2x80x9d or grammatical equivalents indicates that the coding region encoding the protein (e.g., an enzyme) has been joined to a promoter which is not the promoter naturally associated with the coding region in the genome of an organism (i.e., it is linked to an exogenous promoter). The promoter which is naturally associated or linked to a coding region in the genome is referred to as the xe2x80x9cendogenous promoterxe2x80x9d for that coding region.
The term xe2x80x9ctranscription unitxe2x80x9d as used herein refers to the segment of DNA between the sites of initiation and termination of transcription and the regulatory elements necessary for the efficient initiation and termination. For example, a segment of DNA comprising an enhancer/promoter, a coding region, and a termination and polyadenylation sequence comprises a transcription unit.
The term xe2x80x9cregulatory elementxe2x80x9d as used herein refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element which facilitates the initiation of transcription of an operably linked coding region.
The term xe2x80x9cexpression vectorxe2x80x9d or xe2x80x9cvectorxe2x80x9d as used herein refers to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes include a promoter, optionally an operator sequence, a ribosome binding site and possibly other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.
Transcriptional control signals in eucaryotes comprise xe2x80x9cpromoterxe2x80x9d and xe2x80x9cenhancerxe2x80x9d elements. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription [Maniatis et al., Science 236:1237 (1987)]. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect and mammalian cells and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types [for review see Voss et al., Trends Biochem. Sci. 11:287 (1986) and Maniatis et al., supra (1987)]. For example, the SV40 early gene enhancer is very active in a wide variety of cell types from many mammalian species and has been widely used for the expression of proteins in mammalian cells [Dijkema et al., EMBO J. 4:761 (1985)]. Two other examples of promoter/enhancer elements active in a broad range of mammalian cell types are those from the human elongation factor 1xcex1 gene [Uetsuki et al., J. Biol. Chem., 264:5791 (1989); Kim et al., Gene 91:217 (1990); and Mizushima and Nagata, Nuc. Acids. Res., 18:5322 (1990)] and the long terminal repeats of the Rous sarcoma virus [Gorman et al., Proc. Natl. Acad. Sci. USA 79:6777 (1982)] and the human cytomegalovirus [Boshart et al., Cell 41:521 (1985)].
The term xe2x80x9cpromoter/enhancerxe2x80x9d denotes a segment of DNA which contains sequences capable of providing both promoter and enhancer functions (for example, the long terminal repeats of retroviruses contain both promoter and enhancer functions). The enhancer/promoter may be xe2x80x9cendogenousxe2x80x9d or xe2x80x9cexogenousxe2x80x9d or xe2x80x9cheterologous.xe2x80x9d An endogenous enhancer/promoter is one which is naturally linked with a given gene in the genome. An exogenous (heterologous) enhancer/promoter is one which is placed in juxtaposition to a gene by means of genetic manipulation (i.e., molecular biological techniques).
As used herein xe2x80x9ctutFxe2x80x9d denotes a segment of DNA (presented in FIG. 24) substantially similar to the open reading frame designated as xe2x80x9copen reading frame 2xe2x80x9d which consists of a 60 amino acid sequence which would code for a protein with a calculated molecular mass of 6,900 Da and a predicted pI of 5.2. The translational start begins at the Ncol restriction site and hence no upstream transcriptional regulatory sites or ribosome binding sites for this open reading frame are included on this fragment.
As used herein xe2x80x9ctutGxe2x80x9d denotes a segment of DNA (presented in FIG. 26) substantially similar to the open reading frame designated as xe2x80x9copen reading frame 4xe2x80x9d identified in the SacII/EcoRI fragment consisting essentially of an 81 amino acids sequence with a calculated molecular mass of 9,300 Da and a predicted pI of 7.8.