Vector method is a conventional gene analysis method. In the vector method, a target gene to be analyzed is incorporated into a vector, and the full-length sequence of the gene obtained after the proliferation is then determined using a sequencer. However, the vector method is problematic in that it needs a culture operation, and also in that a sequencer should be used to analyze the full length of a gene.
In recent years, a high speed sequencer used in gene analysis has been developed, and with the development of this sequencer, mate-pair method as a gene analysis means has attracted attention.
FIG. 1 schematically shows the outline of gene analysis using the mate-pair method. In the mate-pair method, a nucleotide sequence for ligation (a restriction enzyme recognition site) is attached to the both ends of the target gene to be analyzed, and the target gene is then circularized. Then, a part including 15 nucleotides or more, and preferably 25 nucleotides or more to several tens of nucleotides or less, of both sides flanking to the restriction enzyme recognition site is cleaved from the circularized gene, generally using a type II restriction enzyme. The part is amplified by PCR, and the nucleotide sequence of the cleaved partial gene is then determined. Thereby, the sequences of both ends of the target gene are determined, and the target gene can be then identified using known sequence data. Mate pair means the sequence data of a pair of nucleotide sequences obtained by reading both ends of a single DNA fragment.
Practically used methods of cleaving a given number of nucleotides from a gene include: a method of cutting sites apart from the recognition site using a type II restriction enzyme to cleave a given number of nucleotides out; and a method comprising physically cutting circular DNA using Sonication or the like, recovering a cleaved fragment with biotin attached to the linker, then amplifying the fragment by PCR, and then determining the sequence of the amplified PCR product.
The mate-pair method can, therefore, identify a known gene by reading a certain length of nucleotide sequence including the both sides flanking to the ligation site in a gene circularized by ligating the both ends of the DNA. Basically, if partial nucleotide sequences of the head and tail portions of a gene were read, these sequences would allow a reliable discrimination among individual genes. Accordingly, the mate-pair method has been adopted as a reliable and simple gene analysis method (Non Patent Literatures 1 and 2). Moreover, the mate-pair method has been applied to the next generation sequence analysis, and thus it has become increasingly important together with the emergence of a high speed sequencer.
However, when a DNA is circularized for gene analysis according to the mate-pair method, besides the self-circularization of a single gene or a single DNA (a single molecule), the circularization of a plurality of DNAs (a plurality of molecules) and the linear binding of a plurality of molecules (two or more molecules) also take place. A linear molecule consisting of a plurality of molecules can be separated and eliminated from a circular molecule by the subsequent operations. On the other hand, a circular molecule consisting of a plurality of molecules cannot be separated from a circular molecule consisting of a single molecule, and it becomes a contaminant. A circular product consisting of a plurality of molecules inhibits individual gene analyses and significantly decreases analytical specificity for the following reasons. Specifically, as shown in FIG. 2, when three types of cDNAs are to be self-circularized, if only single-molecule DNA is circularized as shown in (B), a gene can be specified using precise sequences according to the mate-pair method. However, other than the circularization of a single molecule as shown in (B), an uncircularized linear product may be generated as shown in (C), or two or more cDNAs may be circularized as shown in (D). In the case of (C), the linear product can be eliminated with DNA exonuclease. However, in the case of (D), a circularized product consisting of a plurality of cDNAs is recognized as a circularized molecule, and thus, it cannot be eliminated and becomes a contaminant in the gene analysis according to the mate-pair method.
The gene analysis according to the mate-pair method intends to identify a target gene based on the nucleotide sequences of both ends of the target gene. Specifically, a ligation adapter for circularization is attached to both ends of individual genes, and the two adapter sites are then ligated to each other to circularize the gene. Then, a part including a certain number of nucleotides at both sides with the adapter site being the center is cleaved from the gene. Consequently, the gene can be identified by analyzing the nucleotide sequence of a portion from each end of the original gene. Hence, a circularized product of a plurality of molecules has a plurality of adapter sites, and the two ends attached to an adapter are ends of different genes. Accordingly, as described above, since gene analysis is carried out by cleaving a part including nucleotide sequences of a given number of nucleotides of the both ends attached to the adapter, with the adapter being the center, according to either one of the above described two methods, the gene fragment for the analysis obtained from circularized products of a plurality of molecules comprises ends of different genes. Thus, the analysis of a single gene cannot be carried out.
As such, in the gene analysis according to the mate-pair method, the presence of a circularized product of a plurality of molecules inhibits each gene analysis.
The probability of circularization of a plurality of DNA molecules generally ranges from few to dozen percent, depending on the method applied. In the analysis of a known gene, they are recognized as abnormal nucleotide sequences and thus, elimination of them from the sequence to be analyzed is usually possible. Hence, it causes only a slight decrease in accuracy, although the operation becomes complicated. However, in a case in which the mate-pair method is used to detect the presence of an abnormal gene, such as a fusion gene, in a group of normal genes, if a plurality of normal genes are circularized, it is determined that abnormal genes are present. As a result, it becomes impossible to accurately confirm the presence of an abnormal gene such as a fusion gene.
The fusion gene is a gene with a novel function that is constructed by binding a plurality of (two) genes to each other. For example, abnormalities in chromosome structure, such as deletion, overlapping, recombination and translocation, are found in a cancer cell. When the cleavage and ligation of a gene occur at a DNA level and a structural gene is present at each cleavage point, a fusion gene is formed.
In general, a fusion gene is lethal or senseless to cells, and it does not cause clinical problems in many cases. However, when cell growth is abnormally promoted as a result that a fusion protein generated from such a fusion gene inhibits the control of the cell growth, it causes clinical problems such as tumor formation.
It had been considered that the fusion gene is mainly expressed in hematopoietic tumors. In recent years, however, it has been expected that the fusion gene would be also associated with epithelial solid tumors (Non Patent Literature 3). Among such solid cancers, responsible fusion genes have been discovered from prostatic cancer and lung cancer (Non Patent Literatures 4 and 5).
From these findings, the analysis of a fusion gene, namely, confirmation of the presence of a fusion gene, has attracted attention as a novel method for diagnosing tumors (cancers) and the like. Specifically, by detecting a known fusion gene that has been known as corresponding to pathologic conditions, it becomes possible to make a rapid diagnosis of the pathologic conditions. Furthermore, the discovery of a novel fusion gene leads to the discovery of drug discovery targets.
On the other hand, conventional chromosome analyses performed on solid tumors had had a certain limit, and it had been extremely difficult to analyze and/or confirm a fusion gene. Recently, novel methods, such as the cDNA functional expression analysis method according to Mano et al., have been developed. However, these techniques have been still insufficient in terms of complicated operations, problems regarding accuracy, etc. (Patent Literature 1). In addition, various types of next-generation high speed gene sequencers have been recently developed. Thus, high speed sequence analysis of genes has significantly progressed, and gene analysis in a short time has been realizing. Hence, searching for fusion genes have been started according to high-speed and/or high-scale nucleotide sequence analysis of tumor genomes and/or genes (Non Patent Literature 6).
In order to identify a fusion gene by sequence analysis using the mate-pair method, it is essential to reliably produce single circular DNA from a single cDNA molecule. A schematic view of the analysis of a fusion gene according to the mate-pair method is shown in FIG. 3. There is a case, however, in which single circular DNA may be formed from a plurality of cDNA molecules, as shown in FIG. 4. Thus, when sequence analysis is carried out according to the mate-pair method, the result that a normal gene appears as a fusion gene may be obtained. If this gene is eliminated for the reason that it is not present in a conventional gene sequence, a fusion gene is also eliminated. As a result, it becomes substantially impossible to confirm the presence of a fusion gene.
When a fusion gene is to be detected by the sequence analysis according to the mate-pair method, it is essential to eliminate circularized cDNA of a plurality of genes.