The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
The technology disclosed relates to automatic and instantaneous error recovery in streaming processing frameworks. In particular, it relates to, responsive to detecting a first batch failing either due to a timed-out condition or a crashed worker node, rerunning all stages of the first batch responsive to conditions of respective worker nodes. For a crashed worker node, it includes restarting the crashed worker node with a fresh state, reloading a previous node-state, and rerunning an assigned part of the first batch at the crashing worker node. For a successfully completed worker node, it includes swapping a current node-state to the previous node-state using the externalized state data and rerunning a dependent batch at a committed worker node, and, for a timed-out worker node, maintaining a current node-state of an uncontactable worker node and rerunning another dependent batch at the uncontactable worker node from a point immediately preceding the detecting.
For many analytic solutions, batch processing systems are not sufficient for providing real-time results because of their loading and processing requirements: it can take hours to run batch jobs. As a result, analytics on events can only be generated long after the events have occurred. In contrast, the shortcoming of streaming processing analytics systems is that they do not always provide the level of accuracy and completeness that the batch processing systems provide. The technology disclosed uses a combination of batch and streaming processing modes to deliver contextual responses to complex analytics queries with low-latency on a real-time basis.
In today's world, we are dealing with huge data volumes, popularly referred to as “Big Data”. Web applications that serve and manage millions of Internet users, such as Facebook™, Instagram™, Twitter™, banking websites, or even online retail shops, such as Amazon.com™ or eBay™ are faced with the challenge of ingesting high volumes of data as fast as possible so that the end users can be provided with a real-time experience.
Another major contributor to Big Data is a concept and paradigm called “Internet of Things” (IoT). IoT is about a pervasive presence in the environment of a variety of things/objects that through wireless and wired connections are able to interact with each other and cooperate with other things/objects to create new applications/services. These applications/services are in areas likes smart cities (regions), smart car and mobility, smart home and assisted living, smart industries, public safety, energy and environmental protection, agriculture and tourism.
In today's world, we are dealing with huge data volumes, popularly referred to as “Big Data”. Web applications that serve and manage millions of Internet users, such as Facebook™, Instagram™, Twitter™, banking websites, or even online retail shops, such as Amazon.com™ or eBay™ are faced with the challenge of ingesting high volumes of data as fast as possible so that the end users can be provided with a real-time experience.
Another major contributor to Big Data is a concept and paradigm called “Internet of Things” (IoT). IoT is about a pervasive presence in the environment of a variety of things/objects that through wireless and wired connections are able to interact with each other and cooperate with other things/objects to create new applications/services. These applications/services are in areas likes smart cities (regions), smart car and mobility, smart home and assisted living, smart industries, public safety, energy and environmental protection, agriculture and tourism.
Currently, there is a need to make such IoT applications/services more accessible to non-experts. Till now, non-experts who have highly valuable non-technical domain knowledge have cheered from the sidelines of the IoT ecosystem because of the IoT ecosystem's reliance on tech-heavy products that require substantial programming experience. Thus, it has become imperative to increase the non-experts' ability to independently combine and harness big data computing and analytics without reliance on expensive technical consultants.
Stream processing is quickly becoming a crucial component of Big Data processing solutions for enterprises, with many popular open-source stream processing systems available today, including Apache Storm™, Apache Spark™, Apache Samza™, Apache Flink™, and others. Many of these stream processing solutions offer default schedulers that evenly distribute processing tasks between the available computation resources using a round-robin strategy. However, such a strategy is not cost effective because substantial computation time and resources are lost during assignment and re-assignment of tasks to the correct sequence of computation resources in the stream processing system, thereby introducing significant latency in the system.
Also, an opportunity arises to provide systems and methods that use simple and easily codable declarative language based solutions to execute big data computing and analytics tasks.
Further, an opportunity arises to provide systems and methods that use a combination of concurrent and multiplexed processing schemes to adapt to the varying computational requirements and availability in a stream processing system with little performance loss or added complexity. Increased revenue, higher user retention, improved user engagement and experience may result.