Social networking services are accessed by users to communicate with each other, share their interests, upload images and videos, create new relationships, etc. Social networking services typically operate in a distributed computing environment with data being distributed among one or more server clusters that are each located in one of multiple data centers. A server cluster is a grouping of server computing devices (“servers”). When a user of a social networking service sends a query to request data of his or her friends, a load balancing server can be the first server to receive the request. The load balancing server usually routes the request to any server that has a low load to balance the load across the server clusters. A server in the server cluster receiving the query first attempts to fetch the requested data from a cache. If requested data is not cached, it is fetched from one or more databases. As the load balancing server routes queries based solely on load constraints, queries from (or pertaining to) the same user can end up being routed to different server clusters. This random routing of queries is inefficient because each time a query is routed to a different server cluster, there is a high likelihood that the requested data will not be cached, leading to a “cache miss.” Consequently, the requested data will need to be fetched using the more operationally expensive database queries. A high frequency of fetching data using database queries can lead to increased latency, e.g., because the data is not cached. Moreover, as the queries of the same user are routed to different server clusters, the same data can eventually be cached in multiple server clusters. This duplication of cached data is also inefficient.
One way to increase “cache hits” (i.e., fetch data from a cache instead of database queries) is by routing network traffic (“traffic”) from the same user to the same cluster every time. However, because of load constraints, it can lead to inefficiency. For example, if a server cluster assigned to a user is overloaded, even if cached data requested by the user is available, there would be increased latency in fetching that cached data.