Field of the Invention
The present invention generally relates to executing applications software that process big data. More specifically, the present invention relates to implementing big data software in a single multi-instance node.
Description of the Related Art
Applications software that process big data tend to stress system resources with a large load. To deal with this, big data applications are often run on multiple machines. When multiple copies of big data software are running on multiple machines, they can process large amounts of data more quickly than only one such machine.
Processing big data on multiple machines has disadvantages. When multiple machines execute multiple copies of big data software, the software copies often need to communicate with each other. These multiple machines each have an IP address and communicate over a network such as the Internet. Communication between machines over a network, wherein each machine has its own IP address, inherently will introduce delays due to network latency. Additional steps for data aggregation across the network to deliver the ultimate result incur further delays.
FIG. 1 is a block diagram of a cluster of nodes of the prior art. The cluster of nodes includes nodes 1-6. Each node may include memory, a processor with one or more cores, and other computing components. When processing large amounts of data, a copy of a big data software application may execute on each node. Different copies of the same software may be loaded onto and executed by each node in order to achieve faster processing of large amounts of data. The nodes each have their own IP address and communicate with each other over one or more networks, which introduces latency in the processing of the data.
The delay in processing data and aggregating the result set by big data software degrades performance of the big data software and may even cause errors due to latency. What is needed is an improved method for processing data by big data software.