The present invention relates to apparatuses, methods, and systems for creating phylogenetic trees, especially to an apparatus, a method, and a system suitable for creating a phylogenetic tree for cancer clones.
DNA sequencing with a next generation sequencer (Next Generation Sequencing: NGS) makes it possible that a genome can be sequenced at drastically low cost than by the conventional Sanger method. The cost of genome sequencing, which was about one hundred million U.S. dollars in 2001, was lowered to 4,905 U.S. dollars in 2014 thanks to next generation sequencers. Not only the cost of genome sequencing is lowered, but also vast amounts of sequence data can be obtained in a short period, and an apparatus that can generate vast amounts of sequence data exceeding one trillion bases at a time has been also produced. Such a technology makes it possible that a genome can be sequenced on the basis of samples obtained from a cancer patient.
A problem in the case of executing genome sequence analysis on samples obtained from cancer cells of a patient is the fact that samples of cancer cells form a mixed aggregation comprised of plural kinds of cancer cells, which have mutations at different positions of their genomes, and normal cells. Such nonuniformity among samples is referred to as heterogeneity. Hereinafter, an aggregation of cells whose genomes are almost the same will be referred to as a clone. A sample of a cancer is a mixture of normal cells and plural clones generated from mutations of genomes.
In the case of a mixed aggregation, it cannot be directly judged whether mutations which are detected at different positions are derived from the same cell or not except for the case of mutations detected at positions on the genomes very near to each other. Therefore, it is difficult to investigate how mutations have effects on the functions of the cell. However, in recent years, a technology is proposed in which, among mutations detected in samples of cancers on the basis of the number of NGS sequences having mutations, mutations having almost equal ratios of being included in the cells in the samples are grouped, and the group is identified in association with the frequency of the group (referred to as a mutation group frequency, hereinafter) (Refer to FIG. 2 shown in “Zare et al., PLoS Comput Biol 2014, 10(7):e1003703”).
If it is assumed that a mutation group and the corresponding mutation group frequency are accurately predicted, and that there are no other mutation groups whose mutation group frequencies are completely the same as that of the mutation group through all the samples, the mutation group can be associated with a clone one-to-one. In other words, each mutation group is associated with a clone in which a mutation belonging to the group is first generated, and its mutation group frequency is equal to a summation of the mixture ratio of the clone and the mixture ratios of clones that are derived and evolve from the clone. If not only a mutation group and its frequency, but also evolutionary relations among clones, that is to say, changing processes that show in which clones mutations are generated and into which clones the clones change can be presumed, it can be expected that an important clue to identifying a mutation, which plays an important role in the advancing process of a cancer, is obtained.
“Jiao et al. 2014, BMC Bioinformatics 15:35” (nonpatent literature 1) discloses a technology for creating a phylogenetic tree on the basis of samples of cancers. In the technology disclosed in the above nonpatent literature 1, among NGS sequences obtained from samples of cancers, the number of sequences having mutations in the positions of the mutations on genomes and the number of sequences having no mutations are input. Mixture ratios and phylogenetic trees of clones are presumed and evaluated on the basis of the inputs, and at the same time a mixture ratio and a phylogenetic tree of a clone that are most matched with the given inputs is calculated.
In addition, as a related technology for creating a tree structure using correlation coefficients, a technology disclosed in Japanese Patent Application Laid-Open No. 2000-298495 is well-known.