1. Field of the Invention
The present invention relates to a method for creating and storing a file that enables easier searching and a method for searching for data using the same.
2. Description of the Related Art
FIG. 1 is a conceptual diagram illustrating data stored in a general hard disk. The hard disk constructs a cylinder composed of a plurality of tracts constructing an original plate, and performs input/output (I/O) operations through a Read/Write header connected to a boom of each tract. In FIG. 1, it is assumed that the smallest data unit (i.e., record) is stored in each of the 1st, 2nd, 3rd, 4th, . . . , i−1th, ith, and Nth sectors. The term ‘cluster’ means a set of neighboring sectors. A file manager may arrange a cluster and a physical position using a File Allocation Table (FAT).
In the FAT system, records are sequentially arranged in a plurality of clusters. In order to search for record information of an i-th sector located in an intermediate stage, the FAT system sequentially processes tracks from a first sector to the i-th sector, and finally arrives at the i-th sector, such that it can search for records contained in the first to i-th sectors.
On the other hand, when using a Random Access Memory (RAM), in order to quickly extract necessary information from files including either variables or variable names, it is necessary for all variables to be processed by a Dynamic Random Access Memory (DRAM) in a programming process, such that the RAM can immediately search for a position in which the corresponding variable name is stored. As a result, necessary information can be quickly found in RAM.
However, as DRAM capacity increases, the price of a DRAM serving as a semiconductor material rapidly increases as compared to a hard disk, resulting in a reduction of the cost efficiency of large amount of data that requires more than 128 Gigabytes. Therefore, in order to store large amounts of data, hard disks have been more widely used than DRAMs throughout the world.
Therefore, disc formats of the conventional art have the following disadvantages.
In other words, when using a sequential access method in the same manner as in a disc to search through large amounts of stored data, the access speed geometrically varies with the size of data as compared to a random access speed of a data record.
In addition, provided that the conventional art pre-calculates random access addresses (highly integrated indexes) of all data records and does not store the calculated addresses in external storage, the access speed geometrically changes with the data size.
Specifically, in recent times, with the increasing development of biotechnology, large amounts of dielectric clinical genetic function—related data such as genomics or omics data (large capacity biological information) has been accumulated, and researchers can extract useful information through calculation using the resultant data. The size of each irregular data (each irregular data) is about several to tens of terabytes, and it is expected that the size of each irregular data is about pentabytes during the execution of a greater project. In this case, a speed difference in data access time between the sequential access method and the random access method based on the highly integrated index technology may be several days to several years, such that the conventional art will be incapable of implementing data access or data search.