The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
The technology disclosed relates to managing resource allocation to task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes machine resources, with heterogeneous containers defined over whole machines and some containers including multiple machines. It also includes initially allocating multiple machines to a first container, initially allocating first set of stateful task sequences to the first container, running the first set of stateful task sequences as multiplexed units of work under control of a container-scheduler, where each unit of work for a first task sequence runs to completion on first machine resources in the first container, unless it overruns a time-out, before a next unit of work for a second task sequence runs multiplexed on the first machine resources. It further includes automatically modifying a number of machine resources and/or a number assigned task sequences to a container.
For many analytic solutions, batch processing systems are not sufficient for providing real-time results because of their loading and processing requirements: it can take hours to run batch jobs. As a result, analytics on events can only be generated long after the events have occurred. In contrast, the shortcoming of streaming processing analytics systems is that they do not always provide the level of accuracy and completeness that the batch processing systems provide. The technology disclosed uses a combination of batch and streaming processing modes to deliver contextual responses to complex analytics queries with low-latency on a real-time basis.
In today's world, we are dealing with huge data volumes, popularly referred to as “Big Data”. Web applications that serve and manage millions of Internet users, such as Facebook™, Instagram™, Twitter™, banking websites, or even online retail shops, such as Amazon.com™ or eBay™ are faced with the challenge of ingesting high volumes of data as fast as possible so that the end users can be provided with a real-time experience.
Another major contributor to Big Data is a concept and paradigm called “Internet of Things” (IoT). IoT is about a pervasive presence in the environment of a variety of things/objects that through wireless and wired connections are able to interact with each other and cooperate with other things/objects to create new applications/services. These applications/services are in areas likes smart cities (regions), smart car and mobility, smart home and assisted living, smart industries, public safety, energy and environmental protection, agriculture and tourism.
Currently, there is a need to make such IoT applications/services more accessible to non-experts. Till now, non-experts who have highly valuable non-technical domain knowledge have cheered from the sidelines of the IoT ecosystem because of the IoT ecosystem's reliance on tech-heavy products that require substantial programming experience. Thus, it has become imperative to increase the non-experts' ability to independently combine and harness big data computing and analytics without reliance on expensive technical consultants.
Stream processing is quickly becoming a crucial component of Big Data processing solutions for enterprises, with many popular open-source stream processing systems available today, including Apache Storm™, Apache Spark™, Apache Samza™, Apache Flink™, and others. Many of these stream processing solutions offer default schedulers that evenly distribute processing tasks between the available computation resources using a round-robin strategy. However, such a strategy is not cost effective because substantial computation time and resources are lost during assignment and re-assignment of tasks to the correct sequence of computation resources in the stream processing system, thereby introducing significant latency in the system.
Also, an opportunity arises to provide systems and methods that use simple and easily codable declarative language based solutions to execute big data computing and analytics tasks.
Further, an opportunity arises to provide systems and methods that use a combination of concurrent and multiplexed processing schemes to adapt to the varying computational requirements and availability in a stream processing system—with little performance loss or added complexity. Increased revenue, higher user retention, improved user engagement and experience may result.