An increasing number of data-intensive distributed applications are being developed to serve various needs, such as processing very large data sets that generally cannot be handled by a single computer. Instead, clusters of computers are employed to distribute various tasks, such as organizing and accessing the data and performing related operations with respect to the data. Various applications and frameworks have been developed to interact with such large data sets, including Hive, HBase, Hadoop, Spark, Amazon S3, and CloudStore, among others.
At the same time, virtualization techniques have gained popularity and are now commonplace in data centers and other computing environments in which it is useful to increase the efficiency with which computing resources are used. In a virtualized environment, one or more virtual nodes are instantiated on an underlying host computer and share the resources of the underlying computer. Accordingly, rather than implementing a single node per host computing system, multiple nodes may be deployed on a host to more efficiently use the processing resources of the computing system. These virtual nodes may include full operating system virtual machines, Linux containers, such as Docker containers, jails, or other similar types of virtual containment nodes. However, although virtual nodes may more efficiently use the resources of the underlying host computing systems, difficulties often arise in scaling the virtual nodes to meet the requirements of multiple tenants that may share the resources of the host computing systems.
Overview
The technology disclosed herein enhances the scalability of a large-scale processing environment for multiple tenants. In one implementation, a method of operating a control node includes receiving a request to configure a virtual cluster with one or more data processing nodes, and identifying a tenant associated with the request. The method further provides identifying a namespace for the tenant, and identifying internet protocol (IP) addresses for the one or more data processing nodes. The method also includes generating namespace to IP address pairs for the one or more data processing nodes based on the namespace and the IP addresses, and configuring a domain name system (DNS) for the virtual cluster with the namespace to IP address pairs.
This Overview is provided to introduce a selection of concepts in a simplified form that are further described below in the Technical Disclosure. It should be understood that this Overview is not intended to identify key features or essential features of the claimed subject matter, nor should it be used to limit the scope of the claimed subject matter.