A distributed storage system or a parallel storage system is a storage system which virtualizes a plurality of storage devices as one storage device. Such a distributed storage system does not store one file in one storage device, but the file is duplicated, stored and used in a plurality of virtualized storage devices in a distributed manner.
As an existing Redundant Array of Inexpensive Devices (RAID) storage device integrates a plurality of hard disks into one storage device to construct a further larger, further faster and further stable storage device, the distributed storage system may provide functions of a further larger, further faster and further stable storage system by configuring a plurality of storage devices into one storage device.
Such a distributed storage system technique is used as a core technique in cloud computing or the like, and if the number of storage devices configuring the distributed storage system increases further more, capacity and performance of the distributed storage system are proportionally enhanced, and cost-effectiveness of the Total Cost of Owner-ship is maximized. Therefore, the distributed storage system may provide high-level performance and expandability which cannot be provided by existing storage systems.
In relation to this, FIG. 1 is a view showing the configuration of a distributed storage system according to a conventional technique.
Referring to FIG. 1, a distributed storage system generally includes a plurality of storage servers (this corresponds to one virtual storage server) 110 for duplicating and storing a file in a distributed manner, and a metadata server 120 for creating and managing metadata of the file. If at least a client 130 requests input or output of a certain file through a network or the like, the metadata server 120 provides information on the storage servers 110 in which a corresponding file will be or is stored in a distributed manner. Then, the client 130 connects to the storage servers 110 and inputs or outputs the corresponding file, and thus the service is provided. (For reference, in the present invention, the terminology ‘file’ means contents inquired or requested by the client, including a file, data, contents, a chunk or the like).
However, since the distributed storage system according to a conventional technique manages the files stored in the storage servers based on a standardized rule (i.e., manages by applying the same standard), if the client frequently inquires a specific file, bottlenecks to the file are produced, and thus performance and efficiency of the system are degraded.
That is, the conventional techniques do not separately manage frequently inquired files, and thus when a specific file is frequently inquired, bottlenecks are produced since the number of files distributed in the storage devices is relatively small. Therefore, the whole performance and efficiency of the system are degraded.
Meanwhile, a plurality of interrelated distributed storage systems can be interconnected through a network or the like, and FIG. 2 is a view describing the interrelated distributed storage systems, which shows the structure of a plurality of distributed storage systems interconnected through a network or the like.
Referring to FIG. 2, a first distributed storage system 100 includes a plurality of storage servers 110 for storing a file in a distributed manner and a metadata server 120 for managing metadata of the file. In the same manner, a second and a third distributed storage systems 100′ and 100″ also respectively include a plurality of storage servers 110′ and 110″ and a metadata server 120′ and 120″, and the distributed storage systems are closely interconnected through a network or the like and share the files (data or contents).
Like this, when several distributed storage systems are closely interconnected and share the files, conventional techniques generally operate the interconnected distributed storage systems in two methods. First, all the files are synchronized with one another, and each of the distributed storage systems possesses the identical file. Second, each of the distributed storage systems possesses an original file and responds to a request of inquiry by transmitting a corresponding file if the other storage systems request inquiry of the file.
However, in the case of the first method, since all the files should be synchronized with one another in the interconnected distributed storage systems, the files are frequently moved, and a large storage space is required. Particularly, since certain distributed storage systems should store even the files that are almost not requested, storage space and communication bandwidths are greatly wasted.
In addition, in the case of the second method, waste of storage space can be reduced since files are actually moved only when inquiry of a corresponding file is requested. However, in the case of a hot file frequently inquired by clients, the file may be requested from a plurality of distributed storage systems at the same time, and thus bottlenecks are produced. Therefore, response speed is slowed down, and thus performance and efficiency of the system are degraded.