The present invention relates to methods and compositions for nucleic acid production, analysis and cloning. The present invention discloses tools and methods for the production and analysis of libraries of polynucleotides, particularly metagenomic libraries, which can be used to identify novel pathways, novel enzymes and novel metabolites of interest in various areas, including pharmaceutical, cosmetic, agrochemical and/or food industry.
Drug discovery process is based on two main fields, namely combinatorial chemistry and natural products. Combinatorial chemistry has shown its ability to generate huge amounts of molecules, but with limited chemical diversity. At the opposite, natural products have been the most predominant source of structural and molecular diversity. However, the exploitation of this diversity is strongly hampered by their limited access, complex identification and purification processes, as well as by their production.
Microorganisms are known to synthesize a large diversity of natural compounds which are already widely used in therapeutic, agriculture, food and industrial areas. However, this promising approach to the identification of new natural compounds has always been considerably limited by the principal technological bolts of isolating and in vitro propagating the huge diversity of bacteria. Most microorganisms living in a natural, complex environment (soil, digestive tract, sea, etc. . . . ) have not been cultivated because their optimal living conditions are either unknown or difficult to reproduce. Numbers of scientific publications relate this fact and it is now assumed that less than about 1% of the total bacterial diversity (when all environments are considered together) have been isolated and cultivated (Amann et al, 1995).
New approaches have been developed to try to overpass the critical step of isolation, and to access directly to the huge genetic potential established by the microbial adaptation processes through their long evolution. These approaches are called “Metagenomic” because they address a plurality of genomes of a whole bacterial community, without any distinction (metagenome).
Metagenomics involve direct extraction of DNAs from environmental samples and their propagation and expression into a cultivated host cell, typically a bacteria. Metagenomic has been firstly developed for the identification of new bacterial phylum (Pace. 1997). This use is based on the specific cloning of genes recognized for their interest as phylogenetic markers, such as 16S rDNA genes. Further developments of Metagenomics relate to the detection and cloning of genes coding for proteins with environmental or industrial interest. These first two applications of metagenomic involve a first step of gene selection (generally using PCR) before cloning. In the case of protein production, the cloning vector used are preferentially also expression vectors, i.e., they contain regulatory sequences upstream of the cloning site causing expression of the cloned gene in a given bacterial host strain.
More recent developments of metagenomic consider the total metagenome cloned without any selection and/or identification, to establish random “Metagenomic DNA libraries”. This provides an access to the whole genetic potential of bacterial diversity without any “a priori” selection. Metagenomic DNA libraries are composed of hundreds of thousands of clones which differ from each other by the environmental DNA fragments which have been cloned. In this respect, large DNA fragments have been cloned (more than 30 Kb), so as to (i) limit the number of clones which have to be analysed and (ii) to be able to recover whole biosynthetic pathways for the identification of new metabolites resulting from multi enzymatic synthesis. This last point is of particular interest for bacterial metagenomic libraries since, most the biosynthetic pathways have been found to be naturally organised in a same cluster of DNA and even in the same operon in bacteria. Nevertheless, the heterologuous expression of a whole biosynthetic pathways (large DNA fragment) needs a much more improved system than a simple expression vector to have a full and stable expression.
Except for the identification and characterisation of bacterial community at the phylogenetic or diversity levels, metagenomic libraries produced in the prior art are gene expression libraries, i.e., the environmental DNA fragments are cloned downstream of a functional promoter, to allow their expression and analysis. In this regard, WO99/45154 and WO96/34112 relate to combinatorial gene expression libraries which comprise a pool of expression constructs where each expression construct contains DNA which is operably associated with one or more regulatory regions that drive expression of genes in an appropriate organisms. Furthermore, the expression constructs used in these methods have a very limited and invariable host range. Similarly, WO 01/40497 relates to the construction and use of expression vectors which can be transferred in one chosen expression bacterial host of the Streptomyces genus. All these approaches are, however, very limited since they require the presence of expression signals and confer invariable or very limited host range capabilities. Furthermore, most (if not all) metagenomic DNA libraries have been established in E. coli which is the most efficient cloning system. However, most environmental DNA are not expressed or functionally active in E. coli. In particular, functional analysis in E. coli of genes cloned from G+C rich organisms, such as Actinomyces, could be limited by the lack of adequate transcription and translation system. Also, posttranslational modification system in E. coli is not operative on heterologous proteins from Actinomicetes and some specific substrates for proteins activity are not present in E. coli. 
The stable maintenance of large foreign DNA fragments (>10 Kb) into a selected host cell is one of the key points for academic research or applied industrial purposes. Usually, the vector carrying the foreign DNA is maintained by cultivating the host cells in a medium with a vector-specific selective pressure (resistance to an antibiotic for example). However, when large foreign DNA fragments are cloned and/or expressed, their propagation and/or expression require energy, which is not allocated for cell growth anymore. As a consequence of this new resource allocation (nutrients/energy), it is not unusual to have a genetic rearrangement of the foreign DNA (deletion, modification etc. . . . ) as a recombinant cell reaction. This results in the modification of the foreign genetic information and in the loss of DNA functionality. This can be observed without any loss of the selective pressure carried by the vector. As a result, the recombinant clone is no more exploitable for genetic or functional analysis.
Thus, the exploitation of the huge potential of metagenomics for the discovery of new natural compounds, pathways or genes cannot be achieved with currently existing methods. Alternative technologies and processes must be developed, to allow stable maintenance and propagation of large foreign DNAs into host cells for production of efficient libraries and functional screening in a large variety of host cell species, including Bacillus or Streptomyces, to take full account of the huge diversity of the environmental DNAs.