As shown in FIG. 1, large-scale distributed systems provide networked online storage and allow multiple computing devices to store, access, and share files in the online storage. Distributed systems may use a client/server architecture in which one or more central servers store data and provide data access to network clients. Data may be stored in multiple datacenters such as the example datacenter illustrated in FIG. 7.
A large-scale distributed system may be an email system, a messaging system, a videoconferencing system, or another communication system. Within a large-scale distributed system, an individual user of such a large-scale distributed system may communicate with multiple other users in the system, communicating with some users more frequently than others. The individual user may also use different means of communication to interact with others. For example, a User A may text User B constantly and email Users C and D daily.
These large-scale distributed systems may rely heavily on user data for functionality and performance. Such systems may need to store user data, including messages, posts, pictures, videos, audio, and email among other data. The systems may also create multiple copies of user data, referred to as user data replicas, to help quickly and efficiently provide access to user data.
Determining exactly how to allocate resources within a large-scale distributed system can be a difficult global loadbalancing problem. Several factors need to be considered in order to find a solution to this problem. System resources such as processors, servers, storage devices, and user replicas should be allocated so that user data can be quickly and efficiently processed, accessed, and stored. Resources may be allocated across data centers and should be allocated in a way that ensures that the large-scale distributed system is stable and reliable. Performance of backend data storage and compliance with distributed transaction constraints should also be considered when deciding how to allocate system resources.
Conventional models provide solutions to the global loadbalancing problem and reduce the computational complexity of determining user data replica placement in large-scale distributed systems. Some conventional models consider distributed transaction needs, system stability and reliability requirements, and performance optimization. However, as recognized by the inventors, a distributed system may also need to take into consideration the communication patterns among users in a social manner in order to improve system performance and reduce resource usage.