This invention relates to a symbiotic plasmid of the broad host-range Rhizobium sp. NGR234 and its use. In particular, this invention relates to the isolation and analysis of the complete sequence of the NGR234 symbiotic plasmid pNGR234a, and the open reading frames (ORFs) identifiable therein as well as the proteins expressible from said ORFs.
Together with carbon, hydrogen and oxygen, nitrogen is one of the essential components in organic chemistry. Although it is present in vast quantities in the atmosphere, nitrogen in its diatomic form N2 remains unassimilable by living organisms. The nitrogen cycle begins by the fixation of nitrogen into ammonia which is chemically more reactive and can be assimilated into the food chain. A large fraction of the total nitrogen fixed every year is produced by microorganisms. Among these, the soil bacteria of the genera Azorhizobium, Bradyrhizobium, Sinorhizobium and Rhizobium, generally referred to as rhizobia, fix nitrogen in symbiotic associations with many plants from the Leguminosae family. This highly specific interaction leads to the formation of specialized root-, and in the case of Azorhizobium, stem-structures called nodules. It is within these nodules that rhizobia differentiate into bacteroids capable of fixing atmospheric nitrogen into ammonia. In turn, ammonia diffuses into the vegetal cells and sustains plant growth even under limiting nitrogen conditions.
The Rhizobium-legume interaction presents many interesting features. Obviously, the possibility of using this symbiosis as an xe2x80x9cenvironmentally friendlyxe2x80x9d way to provide some of the most important world crops (such as soybean, bean and many other legumes) with fixed nitrogen without using nitrate-rich fertilizers, has important economic consequences. It is also an ideal model to study a non-pathogenic interaction between bacteria and a highly developed, multicellular organism such as the host plant. Furthermore, the various steps involved in the establishment of a functional nitrogen symbiosis, which include some dramatic morphological changes as well as processes of cellular differentiation, require a complex exchange of molecular signals. Despite many decades of studies, it is only recently that the Rhizobium-legume interaction has been partially understood at the molecular level. The establishment of a functional symbiosis can be divided into two major steps as follows.
(A) Rhizosphere Ecology and Modulation
Rhizobia are soil bacteria that proliferate in the rhizosphere of compatible plants, taking advantage of the many compounds released by plant roots. In return it has been shown that the presence of rhizobia in the rhizosphere reduces susceptibility of plants to many root diseases. In the case of low nitrogen levels in the soil, compatible rhizobia can interact with host plants and start the nodulation process (Long, 1989; Fellay et al., 1995; van Rhijn and Vanderleyden, 1995). Molecular signalling between the two partners begins with the release by the plant of phenolic compounds (mostly flavonoids) that induce the expression of nodulation genes (referred to as nod, nol and noe genes). The NodD1 gene product appears to be the central mediator between the plant signal and nodulation gene induction (Bender et al., 1988). It is modified by the binding of flavonoids and acts as a positive regulator on the expression of the remaining nodulation genes. Among them, the nodABC loci encode products responsible for the synthesis of the core structure of lipooligosaccharides called Nod factors (Relixc4x87 et al., 1994). More nodulation genes are involved in strain-specific modifications of the Nod factors as well as in its secretion. It seems established now that variability in the structure of Nod factors may play a significant role in the determination of the host-range of a given Rhizobium strain, that is in its ability to efficiently nodulate different legumes. For example, the strain Rhizobium meliloti can only nodulate Medicago, Melilotus and Trigonella ssp., whereas Rhizobium sp. NGR234 can symbiotically interact with more than 105 different genera of plants, including the non-legume Parasponia andersonii. 
The structure of many Nod factors, their isolation from Rhizobium strains and their commercial application in agriculture have been described (NodNGR-Faktoren: Relixc4x87 et al., 1994; WO 94/00466; NodRm-Faktoren: WO 91/15496). Secreted Nod factors act in turn as signal molecules that allow rhizobia to enter young root hairs of a host plant, and induce root-cortical cell division that will produce the future nodule. Invaginated rhizobia progress towards the forming nodule within infection threads that are synthesized by the plant cells. Bacteria are then released into the cytoplasm of dividing nodule cells where they differentiate into bacteroids capable of fixing atmospheric nitrogen.
With respect to regulation of the nodulation genes, other regulatory genes with similarities to nodD1 (genes that belong to the lysR family) have been identified in various strains (Davis and Johnston, 1990). The function of these genes, called nodD2, nodD3 or syrM, is only partially understood. Some nodD genes have been described (WO 94/00466; CA 1314249; WO 87/07910; U.S. Pat No. 5,023,180). Also, recombinant DNA molecules including the consensus sequence of the promoters of nodD1-regulated genes, called nod-boxes (Fisher and Long, 1993), have been disclosed (U.S. Pat. Nos. 5,484,718; 5,085,588). Finally, recombinant plasmids with the nodABC genes or, in one case (Bradyrhizobium japonicum), a sequence influencing host specificity have been disclosed (U.S. Pat. Nos. 5,045,461; 4,966,847).
(B) Symbiotic Nitrogen Fixation
Inside the nodules, rhizobia differentiate into bacteroids that express the enzymatic complex (nitrogenase) required for the reduction of atmospheric nitrogen into ammonia. The nitrogenase is encoded by three genes nifH, nifD and nifK which are well conserved in nitrogen fixing organisms (Badenoch-Jones et al., 1989). Many additional loci are necessary for functional nitrogenase activity. Those originally identified in Klebsiella pneumoniae are known as nif genes, whereas those found only in Rhizobium strains are described as fix genes (Fischer, 1994). Some of these gene products are required for the biosynthesis of cofactors, the assembly of the enzymatic complex or play regulatory and different accessory roles (oxygen-limited respiration, etc.). Many of these genes are less conserved among the various rhizobial strains and in some cases their function is still not fully understood. The high sensitivity of the nitrogenase complex to free oxygen requires a very strict control of most nif and fix gene expression. In this respect, the FixL, FixJ, FixK, NifA and RpoN proteins have been identified in representative Rhizobium species as the major regulatory elements that, in microanaerobic conditions, activate the synthesis of the nitrogenase complex (Fischer, 1994). Recombinant DNA molecules containing nif genes/promoters have been disclosed: nifH promoters of B. japonicum (U.S. Pat. No. 5,008,194), nifH and nifD promoter of R. japonicum (EP 164245), nifA of B. japonicum and R. meliloti (EP 339830), nifHDK and hydrogen-uptake (hup) genes of R. japonicum (EP 205071).
Many more genetic determinants play a significant role in the Rhizobium-legume symbiosis. Genes (exo, lps and ndv genes) involved in the production of extracellular polysaccharides (EPS), lipopolysaccharides (LPS) and cyclic glucanes of rhizobia play an essential role in the symbiotic interaction (Long et al., 1988; Stanfield et al., 1988). Mutation in these genes negatively influences the development of functional nodules. In this respect, some exopolysaccharides of the NGR234 derivative strain ANU280, have been disclosed (WO 87/06796). Although Nod factors seem to play a key role in the nodulation process, experimental data indicate that other signal molecules produced by the bacterial symbionts are required for functional symbiosis and may play a role in coordinating various steps such as the controlled invasion process, the release of rhizobia from the infection thread into the plant cell cytoplasm, the bacteroid differentiation process, etc. Moreover, the need for rhizobia to survive in the rhizosphere and to compete adequately with other microorganisms requires many more unidentified genes that, although they may not be characterised as proper symbiotic loci, do affect the efficiency of the various strains to induce functional nitrogen fixing symbiosis in field conditions. Finally, in our view genetic engineering of improved rhizobial strains cannot be pursued without a more extended knowledge of the structure and complexity of the Rhizobium symbiotic genome.
In this respect we decided to determine the complete DNA sequence of a symbiotic plasmid of Rhizobium sp. NGR234. In contrast to Bradyrhizobium and Azorhizobium that carry symbiotic genes on large chromosomes (ca. 8 Mbp) and to R. meliloti that harbours two very large symbiotic plasmids of 1.4 and 1.6 Mbp, NGR234 carries a single plasmid of ca. 500 kbp, pNGR234a. Moreover, it has been shown by transfer of pNGR234a into heterologous rhizobia, and even into non-nodulating Agrobacterium tumefaciens, that most nodulation functions are encoded by this plasmid (Broughton et al., 1984). The fact that NGR234 is able to interact symbiotically with more plants than any other known strain, and that a complete ordered cosmid library of pNGR234a was available, reinforced NGR234 as the best choice for a large-scale sequencing effort on a symbiotic plasmid (Perret et al., 1991; Freiberg et al., 1997).
Automated fluorescent methods have been used to sequence cosmids from eukaryotic organisms, including Saccharomyces cerevisiae (Levy, 1994), Caenorhabditis elegans (Sulston et al., 1992), Drosophila melanogaster (Hartl and Palazzolo, 1993), and Homo sapiens (Bodmer, 1994), as well as chromosomes from the prokaryotes Haemophilus influenzae (Fleischmann et al., 1995) and Mycoplasma genitalium (Fraser et al., 1995). In most large-scale sequencing centres this technology is based mainly on the shotgun approach. After random fragmentation of DNA (e.g. cosmids, bacterial artificial chromosomes (BACs), entire chromosomes) using sonication or mechanical forces, size-selected fragments are subcloned into M13 phages, phagemids or plasmids and sequenced by cycle sequencing using dye primers (Craxton, 1993). A disadvantage of this method is that DNA regions with elevated GC contents produce large numbers of compressions (unresolvable foci in sequence gels) in the dye primer sequences leading to several hundred compressions per assembled cosmid sequence. It is known that the use of dye terminatorsxe2x80x94fluorescently labelled dideoxynucleoside triphosphatesxe2x80x94instead of dye primers reduces the number of compressions (Rosenthal and Charnock-Jones, 1993). Therefore, dye terminators are frequently being used for gap closure and proofreading after assembly of the shotgun data.
To sequence GC-rich cosmids with the highest accuracy, the effectiveness of shotgun sequencing with dye terminators in comparison to dye primer sequencing was investigated. To improve the incorporation of dye terminators into DNA, a modified Taq DNA polymerase carrying a single mutation was used (Tabor and Richardson, 1995). This enzyme has properties similar to a thermostable xe2x80x9csequenasexe2x80x9d and is commercially available as Thermo Sequenase (Amersham, Buckinghamshire, UK) or AmpliTaq FS (Perkin-Elmer, Foster City, Calif., USA). Concentrations of dye terminators needed in the cycle sequencing reactions can be reduced by 20-250 times. It was found that dye terminator shotgun sequencing leads to compression-free raw data that can be assembled much faster than shotgun data mainly obtained by dye primer sequencing. This strategy thus allows a several-fold increase in speed to sequence individual cosmids. This was demonstrated by comparing assembly of the sequence data of two cosmids from pNGR234a generated by different chemistries: Cosmid pXB296 was sequenced with dye terminators, whereas data for pXB110 were obtained using the common dye primer method. Also disclosed is the analysis of the entire pXB296 sequence.
Moreover, the dye terminator shotgun sequencing strategy used to generate the sequence data for pXB296 was also used to sequence all the other remaining overlapping cosmids of the plasmid pNGR234a. In summary, 20 cosmids have been sequenced together with two PCR products and a subcloned DNA fragment derived from a cosmid identified as pXB564 in order to generate the plasmid""s complete nucleotide sequence.
After its assembly, the analysis of the entire nucleotide sequence of pNGR234a, especially the determination of putative coding regions and the prediction of their expressible proteins and putative functions, was performed. Initially, analysis of the region covered by cosmid pXB296 was extended to cosmids pXB368 and pXB110. Thus, in approximately 100 kb of the plasmid (position 417,796-517,279) most ORFs and their deduced proteins with different putative functions were predicted. Subsequently, the rest of pNGR234a was analyzed.
The present invention provides the complete nucleotide sequence of symbiotic plasmid pNGR234a or degenerate variants thereof of Rhizobium sp. NGR234.
The present invention also contemplates sequence variants of the plasmid pNGR234a altered by mutation, deletion or insertion.
Also encompassed by the present invention are each of the ORFs derivable from the nucleotide sequence of pNGR234a or variants thereof.
In a preferred embodiment, the ORFs derived from the nucleotide sequence of pNGR234a encode the functions of nitrogen fixation, nodulation, transportation, permeation, synthesis and modification of surface poly- or oligosaccharides, lipo-oligosaccharides or secreted oligosaccharide derivatives, secretion (of proteins or other biomolecules), transcriptional regulation or DNA-binding, peptidolysis or proteolysis, transposition or integration, plasmid stability, plasmid replication or conjugal plasmid transfer, stress response (such as heat shock, cold shock or osmotic shock), chemotaxis, electron transfer, synthesis of isoprenoid compounds, synthesis of cell wall components, rhizopine metabolism, synthesis and utilization of amino acids, rhizopines, amino acid derivatives or other biomolecules, degradation of xenobiotic compounds, or encode proteins exhibiting similarities to proteins of amino acid metabolism or related ORFs, or enzymes (such as oxidoreductase, transferase, hydrolase, lyase, isomerase or ligase).
In another preferred embodiment, the ORFs are under the control of their natural regulatory elements or under the control of analogues to such natural regulatory elements.
The present invention also provides the sequences of the intergenic regions of pNGR234a which, in a preferred embodiment, are regulatory DNA sequences or repeated elements. In a further preferred embodiment, the intergenic sequences are ORF-fragments.
Also provided by the present invention are mobile elements (insertion elements or mosaic elements) derivable from the nucleotide sequences of the present invention.
The present invention also contemplates the use of the disclosed nucleotide sequences or ORFs in the analysis of genome structure, organisation or dynamics.
Also provided by the present invention is the use of the nucleotide sequences or ORFs in the subcloning of new nucleotide sequences. In a preferred embodiment, the new nucleotide sequences are coding sequences or non-coding sequences.
In yet a further preferred embodiment, the nucleotide sequences or ORFs are used in genome analysis and subcloning methods as oligonucleotide primers or hybridization probes.
The present invention further provides proteins expressible from the disclosed nucleotide sequences or ORFs.
Also contemplated by the present invention is the use of the disclosed nucleotide sequences, individual ORFs or groups of ORFs or the proteins expressible therefrom in the identification and classification of organisms and their genetic information, the identification and characterisation of nucleotide sequences, the identification and characterisation of amino acid sequences or proteins, the transportation of compounds to and from an organism which is host to said nucleotide sequences, ORFs or proteins, the degradation and/or metabolism of organic, inorganic, natural or xenobiotic substances in a host organism, or the modification of the host-range, nitrogen fixation abilities, fitness or competitiveness of organisms.
The present invention also provides plasmid pNGR234a of Rhizobium sp. NGR234 comprising the disclosed nucleotide sequence or any degenerate variant thereof.
The present invention also provides a plasmid harbouring at least one of the disclosed ORFs or any degenerate variant thereof.
The plasmids of the invention may be produced recombinantly and/or by mutation, deletion, insertion or inactivation of an ORF, ORFs or groups of ORFs.
The present invention also provides the use of the disclosed plasmids or variants thereof in obtaining a synthetic minimal set of ORFs required for functional Rhizobium-legume symbiosis, the modification of the host-range of rhizobia, the augmentation of the fitness or competitiveness of Rhizobium sp. NGR234 in the soil and its nodulation efficiency on host plants, the introduction of desired phenotypes into host plants using the disclosed plasmids as stable shuttle systems for foreign DNA encoding said desired phenotypes, or the direct transfer of the disclosed plasmids into rhizobia or other microorganisms without using other vectors for mobilization.
The nucleotide sequences of the present invention were advantageously obtained using known cycle sequencing methods. The preferred dye terminator/thermostable sequenase shotgun sequencing method used to generate the nucleotide sequences of the present invention, when applied to cosmids and when compared to other sequencing methods, was shown to yield sequence reads of the highest fidelity. Consequently, the speed of assembly of particular cosmids was increased, and the resultant high-quality sequences required little editing or proofreading. Thus, the preferred sequencing method described herein was successfully used to generate the complete nucleotide sequence of all the overlapping cosmids of plasmid pNGR234a, thereby resulting in the assembly of the complete sequence of the plasmid.
The complete sequence of pNGR234a is disclosed for the first time in this application, as are the majority of the ORFs predicted within the sequence. Putative functions have been ascribed to the novel and inventive ORFs disclosed herein and the proteins for which they code.