Major cities like London, New York, and Beijing are deploying tens of thousands of cameras. Analyzing live video streams is of considerable importance to many organizations. Traffic departments analyze video feeds from intersection cameras for traffic control, and police departments analyze city-wide cameras for surveillance. Organizations typically deploy a hierarchy of clusters to analyze their video streams. An organization, such as a city's traffic department, runs a private cluster to pull in the video feeds from its cameras (with dedicated bandwidths). The private cluster includes computing capacity for analytics while also tapping into public cloud services for overflow computing needs. The uplink bandwidth between the private cluster and the public cloud services, however, is usually not sufficient to stream all the camera feeds to the cloud for analytics. In addition, some video cameras have onboard computing capacity, however limited, for video analytics.
As known in the art of video analytics, a video analytics query defines a pipeline of computer vision components. For example, an object tracking query typically includes a “decoder” component that converts video to frames, followed by a “detector” component that identifies the objects in each frame, and an “associator” component that matches objects across frames, thereby tracking them over time. The various components may be included in software or hardware, such as a dedicated circuit (e.g., an application specific integrated circuit (ASIC)).
Video query components may have many implementation choices that provide the same abstraction. For example, object detectors take a frame and output a list of detected objects. Detectors can use background subtraction to identify moving objects against a static background or a deep neural network (DNN) to detect objects based on visual features. Background subtraction requires fewer resources than a DNN but is also less accurate because it misses stationary objects. Components can also have many “knobs” (e.g., adjustable attributes or settings) that further impact query accuracy and resource demands. Frame resolution is one such knob; higher resolution improves detection but requires more resources. Video queries may have thousands of different combinations of implementations and knob values. As used in this disclosure, “query planning” is defined as selecting the best combination of implementations and knob values for a query.
In addition to planning, components of queries have to be placed across the hierarchy of clusters. Placement dictates the multiple resource demands (network bandwidth, computing resources, etc.) at each cluster. For example, assigning the tracker query's detector component to the camera and the associator component to the private cluster uses computing and network resources of the camera and the private cluster, but not the uplink network bandwidth out of the private cluster or any resources in the public cloud. While a query plan has a single accuracy value, it can have multiple placement options each with its own resource demands.
Finally, multiple queries analyzing video from the same camera often have common components. For example, a video query directed to a car counter and a video query directed to a pedestrian monitor both need an object detector component and associator component. The common components are typically the core vision building blocks. Merging common components significantly saves resources, but some restrictions may apply (e.g., they can only be merged if they have the same plan and are placed in the same cluster.)
Current video analytics solutions make static decisions on query plans and placements. These decisions are often conservative on resource demands and result in low accuracies while leaving resources underutilized. At the same time, running all the queries at the highest accuracy is often infeasible because the private cluster does not have enough compute to run them locally, or bandwidth to push all the streams to the cloud. Production stream processing systems commonly employ fair sharing among queries. But fair sharing is a poor choice because its decisions are agnostic to the resource-accuracy relationships of queries.