1. Technical Field
The present invention relates generally to a method and apparatus for the distributed processing of a file and, more particularly, to a method and apparatus for the distributed processing of a file that are capable of efficiently performing segmentation, merging, and front addition in connection with a large file.
2. Description of the Related Art
A conventional file system provides only file open, read, write, end-add, end-truncate, and close operations. In computer systems used in genome and protein analyses, efficient tasks cannot be performed using only the operations that are provided by the conventional file system as described above.
The size of an input data file for a genome analysis application is very large (e.g., 218 GB), and the time it takes to analyze the content of the data file is very long. In order to reduce analysis time, a data file is segmented into a plurality of small files, the small files are processed in parallel, the processed files are merged into a single large file, and the single large file is used as input in a subsequent stage.
As described above, the conventional file system offsets a data parallelism effect because it consumes a lot of time to perform the tasks of segmenting a large file and merging small files.
Furthermore, in the conventional file system, in order to fragment a file, the large input/output bandwidths of a data storage device are used because the original file needs to be read and written into multiple files, thereby deteriorating system efficiency and performance.
In connection with this, Korean Patent Application Publication No. 10-2002-0092550 discloses a mass file storage system and a method of deleting and adding the data blocks of dynamic multi-level inodes using the system.