Many methods in molecular biology require the use of short DNA sequences (oligonucleotides) satisfying given physicochemical and biological requirements to assess the presence of a certain organism or group of organisms. Among these methods, fluorescent in situ hybridization (FISH), denaturing gradient gel electrophoresis (DGGE), conjugation with specific markers, like detection or quantification probes for certain microorganisms, genes or sequences, and polymerase chain reaction (PCR), where two oligonucleotides are used as primers for the reaction, could be mentioned. This invention could be applied in said cases or in other cases wherein specific oligonucleotides are required.
Usually, oligonucleotides are artificially synthesized according to the description of their composing bases. The determination of the specific sequences that are suitable for each particular procedure is called “oligonucleotide design”. According to the involved procedure, certain thermodynamic restrictions could limit the set of valid oligonucleotides. Oligonucleotides resulting from this design procedure will be completely determined by the nucleotide sequences used in their synthesis, which could be characterized as words having finite length in the alphabet {A, C, T, G}.
Traditional oligonucleotide design methods, among which Primer3 (Rozen S., Skaletsky, H. (2000). Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, N.J., pp 365-386) can be mentioned, allow the design of oligonucleotides pairs or primers for PCR amplification, validating a series of thermodynamic requirements. However, these methods only allow the design of oligonucleotides for a particular sequence, not considering the case where many sequences from different organisms are to be recognized. The traditionally used method in this case requires performing a multiple alignment of all the sequences that are to be recognized, by means of a computer program as CLUSTALW (Higgins D., Thompson J., Gibson T. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680). This alignment allows the determination of conserved regions among all the sequences to be recognized and therefore the design of oligonucleotides within these regions. However, the performance of these alignments is expensive and could be prohibitive when the number of sequences is large. Moreover, multiple alignments require the determination of penalty parameters derived from some evolutionary model of the sequences. The result depends on the values chosen for these penalties and may not be robust when confronted to small changes in these values.
Among other methods for oligonucleotide design that have been developed in the last years, document US2003097223 (Nakae & Ihara, 22/05/2003) could be mentioned, for instance, which protects a new primer design method. This method automatically designs primer pairs and then these primer pairs are selected according to certain requirements, namely oligonucleotide length, GC content percentage and Tm (melting temperature). Besides the basic aspects in primer design, well-known for someone skilled in the art, the method of the present invention considers a thermodynamic analysis of the designed primers, which adds an advantage over the method described in US2003097223 as the stability of the designed primers is guaranteed, improving the success probabilities of the use of said primers. Another different aspect between the former document and the invention herein disclosed is the fact that said document points to the finding of primers useful for many exons of a genome, whereas in one aspect of the invention all the microorganisms belonging to certain taxon are to be amplified; this fact constitutes a difference by itself, but the strategy used in both cases to find primers or oligonucleotides that could recognize more than one template is also different in both cases: in document US2003097223 a plurality of primers is designed (indicated as step 701) using bioinformatics means from a data base comprising different exons (step 700), and then PCR amplified DNA fragments are analyzed together with the designed primers, and primers amplifying target exons are empirically determined. Inversely, in the present invention primers present in the maximum number of target sequences are identified from the design database (which includes the target sequences) and primers to be synthesized and used are chosen based on this information.
Another document belonging to a related field in the art is the paper of Wang and Seed: “A PCR primer bank for quantitative gene expression analysis”, Nucleic Acids Research, 2003, vol. 31, No. 24 e154, where an algorithm is validated for the identification of specific transcription primers for PCR; the authors have created an online database with primers that fulfill said requirements for human and mice genes. The algorithm described by Wang and Seed significantly differs form the method proposed in the present invention, firstly because it does not contemplate the possibility of choosing an oligonucleotide or a primer pair common to a determined taxon, but specific primers are chosen for only one target sequence, and secondly because in the oligonucleotide selection procedure ΔG is evaluated only for the last 5 residues at the 3′ end of the molecule and the candidate is rejected when such value is less than −9 kcal/mole. In the present invention, ΔG is evaluated for all the candidate oligonucleotides and the selection criteria is much stringent, as preferentially only oligonucleotides having ΔGhmin equal to −1.5 kcal/mole (ΔG for hairpin formation) are selected. In order to predict the formation of hairpins in the referred paper, sequence auto-complementarity is evaluated and only 5 non-contiguous matches are allowed. In the same way, to avoid the formation of primer dimers the presence of complementary sequences in 4 residues at the 3′ end of the molecule in the same primer (to avoid dimers) and in the other primer (to avoid cross-reactivity) is evaluated. In the present invention, secondary structure formation is faced in a different and more efficient way than the simple sequence complementarity comparison; in this case, differences in Gibbs free energies are evaluated for all possible conformations and the probability of each selected oligonucleotide to form secondary structures is determined based on the most stable conformation.
As can be appreciated, the method of the invention shows indisputable technical advantages over other existent methods in the state of the art.
In summary, up to this date no oligonucleotide design method has been disclosed being fast and economical and allowing the design of specific oligonucleotides for a target sequence when said sequence is part of a metagenomic sample or allowing the design of oligonucleotides that simultaneously recognize various sequences belonging to different organisms.
In this disclosure, said problems of the existing technique have been solved, creating a method for the design of specific oligonucleotides for a given sequence or group of sequences, that considers not only the information of the genetic material to be identified but also the information of all the genetic material that could be present in a metagenomic sample over which the method will be applied.
Another common problem in the field of oligonucleotide design is the fact that even when an oligonucleotide meeting the required specificity could be available, in practice of molecular biology procedures said oligonucleotide is not efficient. Explanations for this inefficiency are formation of secondary structures within the oligonucleotide sequence (hairpins) or auto-hybridization, which decreases the active concentration of the oligonucleotide in the reaction mix. In the case of PCR technique, where an oligonucleotide pair is simultaneously used, a cross-hybridization between both oligonucleotides could be possible, besides auto-hybridization and hairpin formation, which also sequesters oligonucleotides in the reaction mix and makes said reaction inefficient.
In order to overcome this technical problem, the method of the invention includes a step wherein the designed oligonucleotides are thermodynamically evaluated to discard formation of hairpins, auto-hybridization or cross-hybridization between two primers. For each of these situations, Gibbs free energy differences are calculated for all the possible conformations, the most stable conformation being selected; if said most stable conformation has a ΔG value less than a certain threshold, said oligonucleotide is discarded, thus guaranteeing the availability of the designed oligonucleotides.
Thus, the method of the present invention allows solving all the problems existing in the field of oligonucleotide design for Molecular Biology techniques.