1. Technical Field
The present teaching relates to methods, systems, and programming for distributed computing. Particularly, the present teaching is directed to methods, systems, and programming for distributed application stack deployment.
2. Discussion of Technical Background
Distributed computing is a field of computer science that studies distributed systems, which include multiple autonomous computers or parallel virtual machines that communicate through a computer network, such as a computer cluster having multiple nodes. The machines in a distributed system interact with each other in order to achieve a common goal. A computer program that runs in the distributed system is called a distributed application. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers, such as the nodes of a computer cluster. Distributed systems and applications may be applied as various paradigms, including grid computing, utility computing, edge computing, and cloud computing by which users may access the server resources using a computer, netbook, tablet, smart phone, or other device through the Internet.
For instance, APACHE HADOOP is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Rather than rely on hardware to deliver high-availability, HADOOP is designed to detect and handle failures at the application layer, thereby delivering a highly-available service. HADOOP is deployed on a computer cluster in the form of a HADOOP stack, which includes a set of software artifacts (HADOOP components), such as HADOOP software, configuration files, libraries, links, source code, documentations, miscellaneous, etc. The deployment of HADOOP on a cluster of machines usually involves hardware installation, operating system installation, update, and configuration, JAVA installation and configuration, and HADOOP stack installation, configuration, and diagnostic.
One of the most challenging tasks in HADOOP or any other distributed application deployment is ensuring all the artifacts in the application stack are deployed in correct versions on each machine based on the specific role/type of the machine in the cluster. However, known solutions of HADOOP deployment usually involve manual interventions, which are inefficient and ineffective. For example, a user has to fetch artifact versions from developer's email or from deployment decision meeting notes, enter versions into a XML or text file, run a command to download specified artifact versions onto each machine, and download additional required artifacts, especially those that are not properly versioned or not packaged. In addition, known solutions cannot keep tracking all the deployment records, such as the role/type of each machine in the cluster and the specific version of each artifact in the HADOOP stack that has been installed on a particular machine. Therefore, there is a need to provide a solution for automated assembly, deployment, and startup of the specific package versions of distributed application stacks, such as the HADOOP stack, to a set of machines identified in configuration storage, such that the resulting deployment is fully configured and recorded, and the deployed distributed application is ready for use.