Cloud computing may be referred to as a service that provides various information technology (IT) resources distributed over an Internet. The most common cloud computing service models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). The IaaS may provide hardware infrastructure as a service. The PaaS may provide application development and execution platform as a service. The SaaS may provide applications as a service.
The IaaS may further include many sub service categories. Mainly, the IaaS may include a storage service and a computing service, which provide computing resources in a form of a virtual machine. Such a storage service may be provided by a distributed storage system. The distributed storage system may virtually create a storage pool using low-profiled hardware distributed over a network. Such a distributed storage system may dynamically and flexibly provide a shared storage space to users according to rapidly and/or abruptly varying service demands. The distributed storage system may commonly employ an object-based storage scheme. The object-based storage scheme may be, for example, a typical cloud storage service scheme. The Object-based storage scheme may allow each physical storage device to manage its own storage spaces. The object-based storage scheme may improve overall performance of the distributed storage system and allow the distributed storage system to easily expand a storage capability. Furthermore, data may be safely shared independently from related platforms.
The typical distributed storage system may include a plurality of data nodes, which are object-based storage devices. The typical distributed storage system may replicate data and store the replicated data in at least one data node for data safety and high data availability. The replicated data may be referred to as a “replica.” The distributed storage system may generally have two or three replicas, but may have more than three replicas, depending on an importance of a respective object. The distributed storage system may be required to synchronize the replicas of a respective object. Such synchronization may be processed by an independent replicator server.
After creating replicas, at least one data node may be selected to store the created replicas. Typically, a distributed storage system may randomly select data nodes without considering various factors such as a physical location and a status of each data node.
Since the physical location is not considered for data node selection, data nodes separated from a client at a long distance may be selected. Such selection might cause a great delay for processing respective objects. In addition, data nodes gathered in one specific area may be selected. In this case, when a respective network of the specific area fails, many, if not all, of the data nodes in the specific area may be subject to the malfunction and consequently be unavailable.
Since the status of each data node is generally not considered, a distributed storage system may select data nodes having a high processing load, a slow response speed, and a small available space remained. Accordingly, such data node selection scheme may degrade overall performance of a distributed storage system.