A distributed platform deploys servers in different physical locations. A large distributed platform can have hundreds if not thousands of servers operating from the different physical locations. The distributed platform operates these servers to provide content or services from the different physical locations. The servers can provide the content or services for or on behalf of different customers. Content delivery networks (CDNs), cloud service providers, and hosting platforms are examples of some such distributed platforms.
Message flooding is a term for contacting each and every server of the distributed platform. In particular, message flooding is for contacting the servers in a substantially simultaneous manner. There are certain situations in which message flooding is desired. Some such situations include contacting the servers in order to configure the servers or send commands to the servers. Such contacts change control logic, resource allocation, or operational behavior of the servers. Some other situations can include contacting the servers to query the servers and to receive information in response to the queries. For instance, the distributed platform contacts each server in order to collect statistics that help isolate an attack to a specific set of servers of the distributed platform. Subsequent contacts could then be used to reconfigure the servers in combatting the identified attack. In any case, message flooding is a critical tool at the disposal of the distributed platform administrator in order to communicate rapidly with each and every node of the platform for situations that require information from or information to all nodes of the distributed platform.
Traditional means of message flooding are inefficient and slow to complete. This is due in part to the overhead and delays associated with establishing individual connections with each of the distributed platform servers before contacting them. Establishing each such connection involves performing a separate first handshake with each server. A second handshake may also be performed in order to secure or encrypt the connections. The handshakes consume server resources and also introduce delay depending on the number of network hops that the handshake messaging traverses. In other words, the message flooding cannot be initiated until each and every connection is established.
Even once the connections are established, the message flood is not complete until each server is contacted and a response is received from the contacted servers. Traditional message flooding typically involves establishing all such connections from a central server and propagating the contacts from the central server. The central server is closer to some of the distributed platform servers while more distant to others. The propagation delay for any connection establishment or exchanged messaging will therefore vary greatly. The delay is exacerbated because of slow start mechanisms built into the underlying protocols that control the rate at which messages are sent over the newly established connections.
Establishing the connections and contacting the servers is also a factor that contributes to the overall delay for completing the message flooding. In particular, the central server has insufficient resources to establish all such connections and issue all such messaging in parallel when the distributed platform has hundreds or thousands of servers that are included as part of the message flood. Moreover, frequent flooding of the distributed platform introduces so much traffic and overhead that bottlenecks may form on the distributed platform servers or along the network pathways to the servers.
Broadcast or multicast messaging may be used instead of establishing individual connections. This however is an insecure manner with which to perform the message flooding and could expose the distributed platform to significant attacks. Moreover, while the broadcast or multicast messaging may be a good vehicle with which to contact each of the distributed platform servers, they are a bad vehicle with which to collect responds from each of the servers.
A further issue with the above implementations is the indeterminate amount of time to complete traditional message flooding. The message flooding is complete once all distributed platform servers acknowledge and respond to each contact. The completion time is therefore determined based on the server that is slowest to respond. The various delays discussed above as well as network, server, and other failures can cause a particular message flood to take several minutes to complete or an indeterminate amount of time if one server cannot be contacted because of a network link, server, or other failure.
Traditional message flooding therefore cannot be used for time sensitive messaging or when responses are immediately required based on the issues identified above. There is therefore a need for efficient and fast contacting of a large number of servers in a distributed platform. More specifically, there is a need to eliminate or minimize much of the overhead associated with a message flood initiating server establishing separate connections, messaging each server, and receiving responses from each of the distributed platform. There is also a need to provide responses within a determinate amount of time so that the contacting mechanism can be used for time sensitive or real-time applications.