The advent of cloud-based computing architectures has opened new possibilities for the rapid and scalable deployment of virtual web stores, media outlets, and other on-line sites or services. Generally speaking, cloud computing involves delivery of computing as a service rather than a product, whereby shared resources (software, storage resources, etc.) are provided to computing devices as a service. The resources are shared over a network, which is typically the internet. In a cloud computing system, there is a plurality of physical computing machines generally known as nodes. These nodes are connected with each other, either via a high speed local area network or via a high speed bus connection to form a cloud computing infrastructure. The operator of the cloud computing infrastructure provides services to many users such as user computing devices connected to the cloud computing infrastructure via internet. A user or customer can request the instantiation of a node or set of nodes from those resources from a central server or management system to perform intended services or applications. Usually, each service includes several processes running on different nodes and each node may have multi-core processors for simultaneously running multiple processes.
In a cloud computing infrastructure, there is a parent process conventionally initiated and configured in each node to initiate child processes on the same node. Also, the parent process is configured to monitor, maintain, update, restart or delete the child processes such as user application binary or user binary. In fact, the process may be created in any node for the aforementioned functionality of monitoring, managing, updating, initiating, restarting or deleting child processes in each node. Furthermore, the parent process in each node may initiate and restart the child processes according to commands or instructions of the centralized management software or the management entity in the cloud computing infrastructure.
In a conventional cloud system discussed above, it is usually the parent process in each node which directly initiates the child process and stores the process ID assigned by the operating system of the node. In such a case, the parent process has to maintain an inline table containing the “parent-child” relationship between each process name and its corresponding process ID for the node at which the parent process is operating. The process ID of the child process is assigned by the Operating System Kernel of the node, when the child process is firstly initiated. However, such monitoring and management of child processes at the parent process end may be vulnerable when the parent process goes down unexpectedly. In the event of parent process going down accidentally, it is difficult for the particular parent process to recollect the process ID of its child processes. In other words, the “parent-child” relationship is lost when the parent process experiences failure. In order to address said problem presently, an offline database is used which stores the mapping relationship between each process name and its corresponding process ID but it is more expensive in terms of both capital expense and operational expense of the whole cloud computing system.
In this context, there is a need for solutions to provide a method or a system to manage the processes in each node in the cloud computing infrastructure. The solution should at least enable the first process to resume its monitoring activity after the first process goes down accidentally and then determines the operational status of each child processes created by itself.