The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
The technology disclosed relates to managing processing of long tail task sequences in a stream processing framework. In particular, it relates to operating a computing grid that includes a plurality of physical threads which processes data from one or more near real-time (NRT) data streams for multiple task sequences, and queuing data from the NRT data streams as batches in multiple pipelines using a grid-coordinator that controls dispatch of the batches to the physical threads. The method also includes assigning a priority-level to each of the pipelines using a grid-scheduler, wherein the grid-scheduler initiates execution of a first number of batches from a first pipeline before execution of a second number of batches from a second pipeline, responsive to respective priority levels of the first and second pipelines.
For many analytic solutions, batch processing systems are not sufficient for providing real-time results because of their loading and processing requirements that can take hours to run batch jobs. As a result, analytics on events can only be generated long after the events have occurred. In contrasst, the shortcoming of streaming processing analytics systems is that they do not always provide the level of accuracy and completeness that the batch processing systems provide. The technology disclosed uses a combination of batch and streaming processing modes to deliver contextual responses to complex analytics queries with low-latency on a real-time basis.
In today's world, we are dealing with huge data volumes, popularly referred to as “Big Data”. Web applications that serve and manage millions of Internet users, such as Facebook™, Instagram™, Twitter™, banking websites, or even online retail shops, such as Amazon.com™ or eBay™ are faced with the challenge of ingesting high volumes of data as fast as possible so that the end users can be provided with a real-time experience.
Another major contributor to Big Data is a concept and paradigm called “Internet of Things” (IoT). IoT is about a pervasive presence in the environment of a variety of things/objects that through wireless and wired connections are able to interact with each other and cooperate with other things/objects to create new applications/services. These applications/services are in areas likes smart cities (regions), smart car and mobility, smart home and assisted living, smart industries, public safety, energy and environmental protection, agriculture and tourism.
In today's world, we are dealing with huge data volumes, popularly referred to as “Big Data”. Web applications that serve and manage millions of Internet users, such as Facebook™, Instagram™, Twitter™, banking websites, or even online retail shops, such as Amazon.com™ or eBay™ are faced with the challenge of ingesting high volumes of data as fast as possible so that the end users can be provided with a real-time experience.
Another major contributor to Big Data is a concept and paradigm called “Internet of Things” (IoT). IoT is about a pervasive presence in the environment of a variety of things/objects that through wireless and wired connections are able to interact with each other and cooperate with other things/objects to create new applications/services. These applications/services are in areas likes smart cities (regions), smart car and mobility, smart home and assisted living, smart industries, public safety, energy and environmental protection, agriculture and tourism.
Currently, there is a need to make such IoT applications/services more accessible to non-experts. Till now, non-experts who have highly valuable non-technical domain knowledge have cheered from the sidelines of the IoT ecosystem because of the IoT ecosystem's reliance on tech-heavy products that require substantial programming experience. Thus, it has become imperative to increase the non-experts' ability to independently combine and harness big data computing and analytics without reliance on expensive technical consultants.
Stream processing is quickly becoming a crucial component of Big Data processing solutions for enterprises, with many popular open-source stream processing systems available today, including Apache Storm™, Apache Spark™, Apache Samza™, Apache Flink™, and others. Many of these stream processing solutions offer default schedulers that evenly distribute processing tasks between the available computation resources using a round-robin strategy. However, such a strategy is not cost effective because substantial computation time and resources are lost during assignment and re-assignment of tasks to the correct sequence of computation resources in the stream processing system, thereby introducing significant latency in the system.
Also, an opportunity arises to provide systems and methods that use simple and easily codable declarative language based solutions to execute big data computing and analytics tasks.
Further, an opportunity arises to provide systems and methods that use a combination of concurrent and multiplexed processing schemes to adapt to the varying computational requirements and availability in a stream processing system with little performance loss or added complexity. Increased revenue, higher user retention, improved user engagement and experience may result.