Field
Embodiments presented herein generally relate to distributed computing. More specifically, embodiments presented herein provide techniques for automatically discovering, configuring and adding new computing nodes to a secondary storage appliance.
Description of the Related Art
A distributed computing cluster uses multiple computer systems or “nodes” to achieve a common goal or provide a common service. Each node includes its own memory, storage, processing power, and applications used to provide the services (or parts of the services) of the cluster. One example of a distributed computing system is a secondary storage cluster used to provide a variety of services for a primary storage system. For instance, a secondary storage system can provide data backup services for the primary cluster, as well as provide views of backup data to other clients.
Secondary storage systems are frequently used in large data centers, where hundreds or even thousands of computing servers host enterprise applications. In such a case, a primary storage system can regularly store terabytes of data, backed up to the secondary storage system. In large data centers, the secondary storage system can include hundreds of computing nodes used to form the cluster. One advantage to using a cluster is that nodes can be added (or removed) from the cluster as needed. That is, the cluster can scale as needed to provide backup services to the primary storage system.
After placing a new node (or nodes) in a datacenter, an administrator typically connects a display device/keyboard to the node and manually configures network (and other) settings on each new node. For example, an administrator may assign an IP address, IPMI settings, as well as configure other settings to allow the node to participate in the cluster. In many cases, the nodes of a cluster need to be assigned a static IP address (as changes due to the expiration of an IP address lease can be disruptive to the cluster). However, manually configuring each node can be both burdensome and time consuming for the administrator, particularly where many new nodes are added to secondary storage cluster. Further, if an administrator is configuring a large number of nodes, the possibility that mistakes will be made increases. For example, a node could be overlooked entirely (e.g., an administrator configures only 31 of 32 new nodes). After configuring the nodes, the administrator then has to manually enter the IP address assigned each node to form a cluster or to add new nodes to an existing cluster. This manual process also involves some risk, as even simple mistakes on the part of administrator can potentially affect unrelated nodes (e.g., listing an incorrect IP address for a node in the cluster can cause network conflicts). The administrator has to maintain an inventory of nodes that are available to form a cluster to join an existing cluster of a secondary storage system.