Real-time stream computing refers to processing data immediately once the data occurs, that is, performing one processing operation when one event occurs, rather than buffering the events and processing them in batches, for example, a commercial search engine.
A stream computing application model includes operators and streams. The operator (operator) is a data processing unit that carries service logic, and is a minimum unit that can be scheduled and executed in a distributed manner by a stream computing platform. The stream (stream) is data exchanged between operators.
After a service application written based on a stream data processing model is scheduled by a master control node of a stream computing system, the service application runs in a distributed manner on the stream computing platform. The stream computing platform includes a master control node and a physical node for running a service.
Scheduling performed by a stream processing system is carried on mainly according to three key factors of 1. A service model (a service application that runs on a stream computing platform, which is a topology graph); 2. Physical resources (computing resources such as a memory and a central processing unit (CPU) of a physical node in a platform cluster); and 3. Network transmission resources (a network throughput and bandwidth resources in the platform cluster).
The above three factors affect and interfere with each other. Distribution of the service models determines consumption of the physical resources; the consumption of the physical resources affects a network transmission delay; and the network transmission delay further affects task execution efficiency.
An existing stream computing system in the industry basically schedules a task to a physical resource randomly, and performs stream application scheduling without considering the above three factors, which causes unbalanced use of resources when a service is deployed to a platform for running.