The determination of sequences adjacent to a known region of the chromosome is a technically complicated task, and different methodologies have been developed to do this. The techniques that have been described include, among others, ligation-mediated PCR (LM-PCR or genome walking), inverse PCR (i-PCR), thermal asymmetric interlaced PCR (TAIL-PCR), anchored PCR (a-PCR) or randomly primed PCR (rm-PCR). All these methods suffer from low detection sensitivity or low specificity and are furthermore only effective when the point at which the mutation in the genome occurs is at most a few hundred base pairs away from the known sequence. More recently, other methodologies, such as linear amplification-mediated polymerase chain reaction (LAM PCR), which requires generating a double-stranded DNA fragment and digesting this fragment, have been developed. Subsequent modifications of this technique eliminate the need for this digestion, replacing it with the initial digestion of genomic DNA and the ligation of a double-stranded adapter. All these methods are based on exponential DNA amplification.
There are different well-established techniques for identifying regions of the genome with a loss or gain of genetic material. They include, among others, techniques based on PCR (multiplex ligation-dependent probe amplification, MLPA) or on hybridization (comparative genomic hybridization array (CGH array), single nucleotide polymorphism array (SNP array), etc.). Depending on the oligonucleotide or probe design, these techniques allow identifying regions of the genome in which the loss or gain of genetic material occurs, although in no case do they identify the exact start or end points of these regions, nor do they offer, in the case of insertions, any information about the region of the genome in which the insertion occurs. In the case of MLPA, this technique offers very low processivity given that only a limited number of genetic regions (normally exons) can be analyzed. Furthermore, it is a technique that leads to problems relating to the occurrence of false negatives.
In the case of SNP arrays and CGH array, the whole genome of the samples can be analyzed without prior knowledge about the sequence. However, it suffers from a relatively low resolution capacity since it is rarely capable of detecting deletions or insertions less than 50 Kb in size.
The use of techniques such as MLPA (Stuppia et al., Use of the MLPA assay in the molecular diagnosis of gene copy number alterations in human genetic diseases. Int J Mol Sci. 2012; 13: 3245-76), CGH array (Lai et al., Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 2005; 21: 3763-70), SNP arrays (Zhang et al., Evaluation of copy number variation detection for a SNP array platform. BMC Bioinformatics 2014; 15: 50) or NGS targeted sequencing (Zhao et al., Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 2013; 14 (Suppl 11): S1) allows detecting the presence of mutations due to the loss (deletions) or gain (insertions) of genetic material. The result obtained using any of these techniques delimits the minimal chromosomal region comprising with 100% certainty the structural variant, but none of them identifies the exact limits of the structural variant in question. That nucleic acid region comprising with 100% certainty a mutation in question is referred to hereinafter as the “ascertained nucleic acid region” or “ANA region.”
Furthermore, there are other genetic modifications, such as translocations, that do not involve any change in the amount of genetic material and are therefore not detected with the mentioned methods. To detect the ANA region of these structural variants, specific techniques are used, such as Southern blot, karyotyping or fluorescence in situ hybridization (FISH) in translocations of large chromosome segments, or real time PCR (RT-PCR) or Δ-PCR for identifying other translocations which can give rise to gene fusions.
As regards gene fusions, it is currently considered that gene fusions, caused by chromosomal translocations, inversions, deletions, etc., are of great importance in common epithelial cancers, such as prostate or lung carcinomas. For example, most prostate cancers have a fusion that is regulated by androgens of one of the ETS gene family transcription factors. Clinically, some neoplasms are classified or managed according to the presence of a specific gene fusion: for example, promyelocytic leukemias carrying a PML-RARα fusion of retinoic acid α-receptor are treated with retinoic acid, whereas chronic myeloid leukemias with the presence of BCR-ABL fusion are treated with the drug imatinib. However, assays carried out using RT-PCR require knowing both fused elements which in turn give rise to a previously characterized variant, when sometimes only one of the genes which may be involved in the gene fusion is known. Said only one known gene would therefore correspond to the ANA region.
The recent emergence of massive ultrasequencing technologies (next generation sequencing, NGS) has allowed tremendous progress in knowledge about nucleic acid sequences of a wide range of organisms. This knowledge has in turn been used for showing the sources of genetic variability that often went unnoticed in the past, including, among others, genomic structural variants and copy-number variation (CNV).
There are different methods for preparing samples for NGS. In one of them, the methodology includes a “target enrichment” step, consisting of selecting, through different methods (for example, by means of capturing with a solution containing specific probes), that region of the genome to be studied (for example, for sequencing a panel of specific genes, for sequencing only exomes, etc.), disregarding the rest of the genome. These methods obtain extremely reliable readings from a region of interest but almost no information from the rest of the genome.
Another option for characterizing mutations of this type consists of performing whole genome sequencing (WGS) on those samples suspected of having modifications of this type. Although this methodology offers results covering the entire genome, there are recalcitrant areas in which there are few readings or none at all. Furthermore, the WGS methodology is expensive and presents problems when the region of interest is located in repetitive regions of the genome, so it does not assure the correct characterization of genetic variants of this type.
The authors of the present invention have developed a method that overcomes the problems of the aforementioned techniques and allows simplifying the method for identifying mutations and the screening of said mutations.