In execution of a data query, optimization of the query is often an important component. Query optimization is well documented in relation to traditional relational database management systems (RDBMS). Such query optimization generally consists of parsing and optimization.
Parsing may verify that a structured query language (SQL) query is syntactically correct, that tables and query attributes exist, and that a user thereof has appropriate permissions. Parsing also translates the SQL query into a set of simpler query trees, wherein operators can be based on, for instance, relational algebra.
Optimization of a query may, via an optimizer, generate query trees which are equivalent and built bottom up. For each query tree generated, an optimizer may produce a query plan by selecting algorithms for each operator, estimate a cost of the plan and choose the plan having the lowest cost among plans considered.
Query optimization is equally important in real time data streaming management systems (DSMS). Ideally, latencies in actual worst case would be used in query optimization. However, such metrics are not easily measured in sufficiently short time range to be useful for a typical real time application.
Analytics applications may be composed of multiple parts for fulfilling a use case. In a situation in which an application depends on multiple data streams from different sources of a network, which data streams are to be considered, it is of relevance to deploy, run and compute selected parts of an analytics application at physical locations, for instance data centers, to achieve data co-locality and to instantiate a data stream management systems (DSMS) when/where required.
In a distributed cloud, this may imply launching virtual machines (VMs) running these analytics applications at certain locations, having adequate resources, and proper connectivity between other VMs of a single data center (DC), as well as VMs in other DCs without disrupting existing applications in any of the DCs.
It is pointed out that the term DC wherever used herein, represents any aggregate of computing nodes connected through a communication network which is physically located in a single location.
FIG. 1 presents DCs providing virtualized central processing units (CPUs), random access memory (RAM), network connectivity, storage memory. Two of the DCs (DC1, DC2) are situated at an edge of the network, whereas another (DC3) is situated at facilities of a DC provider.
DC1, DC2, DC3, and DCx, may comprise actual radio systems or network functions producing data such as network key performance indicators (KPIs), weather data.
Some real-time operations like real-time continuous queries may need to be executed over a distributed cloud of data producers. Each link between the different DCs may be associated with a specific latency cost.
Main computing capabilities may be located in a main DC where it can be assumed that computational resources are unlimited. Datacenters being located at an edge of the network, for example DC1 and DC2 in FIG. 1, may generally be constrained in terms of computational resources.
A challenge may be to determine where to instantiate a particular part of an analytics application to fulfil a real-time analytics use case. This may comprise to evaluate an impact on on-going simultaneous applications. This may also comprise to certify that conditions are met and potential latency costs are within boundaries to instantiate the VMs, connectivities and resource allocation, in order to fulfil any continuous query (CQ) requirements.
Patent document US20100030896 A1 relates to estimating worst case latencies for query optimization in distributed stream processing, when all nodes belong to one data stream management system (DSMS). This document is restricted as it focuses on latency only, estimates latencies for a worst case only, and only when all nodes belong to a single DSMS.
There is a need for an alternative solution being more applicable for determining how to distribute parts of a query for an optimal or best performance, addressing the issues as discussed above.