A social network is a set of people (or organizations or other social entities) connected by a set of social relationships, such as friendship, co-working or information exchange relationship. There has been a recent unprecedented increase in the use of Online Social Networks (OSNs) to expand our social life, such as finding others of a common interest, discussing and sharing information in forums, and exchanging photos and personal news. The OSNs have become a large-scale distributed system providing services to hundreds of millions of users and delivering messages at very high rate.
Besides handling traditional client-to-server requests, OSNs also need to handle highly interconnected data due to the strong community structure and human relationships among their end users, which often results in complex data sharing among users. Given the tremendous user population and frequent data access by these users, effective resource planning and provisioning strategies are of extreme importance to the performance and revenue of an OSN. In particular, selecting the most suitable locations to deploy server farms is one of the key steps in such resource management.
The development of placement strategies for resources including servers and bandwidth affects the performance of any online web services. An appropriate allocation of resources benefits content providers by reducing latency for their clients and balancing the bandwidth consumption. The goal is to provide content distribution to clients with good Quality of Service (QoS) while retaining efficient and balanced resource consumption of the underlying network infrastructure. Thus, existing server placement proposals mainly focus on minimizing the average latency between the server and the users, given the nature of client/server communication patterns in traditional web services.
Many proposals on the server placement problem rely on extracting clients' requests from history traces collected on the web servers, and then searching for the best placement given the particular client and load distribution. While these proposals might be plausible in improving performance of existing OSN services, these proposals are less helpful to new OSNs that are starting afresh. Thus, a problem arises in that it is difficult for a new born Internet application service to make a decision on where to deploy its servers.
Moreover, much existing work on server placement casts the problem as an integer linear program where a binary decision variable bij is used to denote if user i is assigned to server j; and the total number of selected servers should be a predetermined input M. One of the best known approximation algorithms for this problem (presented by M. Charikar and S. Guha in “Improved combinatorial algorithms for the facility location and k-median problems” in Proceedings of the 40th Annual Symposium on Foundations of Computer Science, 1999) achieves a very large time complexity of O((N+P)3), where N is the number of servers and P is the number of users. To make the problem more manageable in reality, a number of approximation and heuristics have been proposed such as the use of a greedy algorithm (by L. Qiu, V. N. Padmanabhan, and G. M. Voelker, in “On the Placement of Web Server Replicas,” in Proc. of IEEE INFOCOM 2001, 1587-1596). However, these approaches are mainly based on theoretical analysis and are only validated using simulation in very small graphs. Further, such methods have fundamental issues in scaling to a large value of N. Accordingly, there is a need for methods to efficiently and flexibly determine, for existing or new OSNs and any value of N, where to place servers.