In a biological information study field, arithmetic calculations related to a sequence and the like of DNA, RNA, or protein are performed based on reference data. Such arithmetic calculations include target alignment and the like for checking an area having similarity in a functional, structural, and developmental relation between sequences, and several hundreds of MB to several TB of target data needs to be processed for one object according to the application thereof.
To the end, in a conventional art, when a typical memory management method is applied to a general purpose computer having a Von Neumann architecture, since characteristics of an application in which proximity data is classified into reference data and target data are not recognized, several tens of MB/s to several hundreds of MB/s of data input/output (IO) bandwidths are additionally required. The reference data indicates data with a size of about 3 GB to about 50 GB commonly used by a plurality of applications, and the target data indicates data with a size of about 70 GB to about 200 GB, which is to be subject to information processing.
Furthermore, when a plurality of corresponding applications are executed for one computer node, since it is not recognized that reference data is the same data dedicated for reading and a separate memory is allocated, a high memory cost is required. Furthermore, due to inefficient data management, an inefficient arithmetic calculation of a processor as well as a memory is performed.
For example, on the basis of reference data based on a character string indicating a nucleotide sequence or peptide sequence, applications for detecting SNP (Single-nucleotide polymorphism) of a target biological sequence perform a plurality of steps through pipe lines. In the implementation of all pipelines, data of about 780 GB is read and data of about 800 GB is generated. At this time, when a plurality of applications are executed, it is possible to reduce a time required to analyze all the pipelines by using data parallelism of a process level to the extent that pipeline dependency is not violated.