Scalability is an important feature for many systems. Scalability generally refers to the capability to add processing resources (e.g., computers, processors, virtual machines, memory, processes, threads, etc.) to a system such that, for example, the workload (e.g., processing demand) presented to the system can be handled in accordance with certain performance criteria. Performance criteria may include processing delay levels, throughput levels, system and/or processor capacity used, and the like. An example scalable system may be able to add processing elements, such as, for example, web servers, when the incoming workload to a current pool of web servers that serve incoming requests exceeds a first threshold, and to also remove web servers from the pool when the workload drops below a second threshold.
In example scalable systems, alerts may be generated when either the processing resources or the incoming workload reaches certain configured thresholds. For example, an alert may be generated when all processors in the system are at 80% capacity. Alerts are typically intended to warn an operator regarding a condition of the processing system.
In cloud environments, scalability may be even more important than in non-cloud environments. Many techniques are known for achieving scalability and alertness (e.g., ability to generate alerts when system conditions are detected) in cloud management software. Such features are often shipped as a feature (e.g., in the box feature) within hardware and/or software that is sold. Many applications allow for users to configure scale/alert conditions or related parameters using an administrative user interface. For example, Amazon Web Services (AWS) uses the CloudWatch console for defining and generating alarms/alerts. However in AWS, configuring policies for dynamic scaling may involve numerous steps to be performed by the user.
A problem with many of the currently available cloud management software is that the scaling and/or alerting decision logic/module is shipped as a feature within the box (e.g., incorporated into the software and/or hardware) and users (e.g., operator) have limited control over the logic/module, if any. They either provide simple UI-based or complicated commands/APIs-based scaling solution which may not provide for evaluating complex scaling conditions.
Further, many of the currently available cloud management software provide little or no support for implementing and configuring metric sources for custom metrics.
Moreover, many of the currently available cloud management software may not allow for the eviction policy (e.g., policy for evicting processing elements) to be customized, and as a result, may evict processing elements that are not the best suited for being evicted. In many of the currently available systems, no clear indication on how final scaling conclusion is drawn.
Thus, it will be appreciated by those skilled in the art that there is need for improved techniques for providing scalability and/or alerts in processing systems, e.g., in ways that overcome the above-described and/or other problems.
Certain example embodiments provide a scaling engine that may be incorporated into hardware and/or software in order to provide scaling and/or alerting decision services to cloud management platforms. Certain example embodiments may also provide extension points to the scaling engine that may be used to customize various aspects of scalability and alerts such as, for example, implementation of a connection layer between scalability elements, summation policy, eviction policy and metric source can be plugged and powerful as it provides a structured query language.
In certain example embodiments, a scalable system having a plurality of communicatively-coupled processing elements is provided. The system includes at least a scaling master, a plurality of scaling agents, and a connection layer providing one or more interfaces for communication between the scaling master and the plurality of scaling agents over a network. Each scaling agent may be located at a respective one of the plurality of communicatively-coupled processing elements. The scaling master is configured to transmit, using the connection layer, at least one user-specified scaling and/or alerting query to each scaling agent. Each scaling agent is configured to execute the at least one scaling and/or alerting query received from the scaling master, and to report, using the connection layer, a result of the execution to the scaling master. The scaling master is further configured to receive each result and to form a scaling decision based on the results.
According to certain example embodiments, the scaling master may be further configured to receive the at least one scaling and/or alerting query from a user input device or from a configuration file.
According to certain example embodiments, the at least one scaling and/or alerting query may combine plural scaling and/or alerting expressions using one or more conditional operators.
According to certain example embodiments, the scaling master may be further configured to transmit one or more scaling commands based on the formed scaling decision to respective ones of the plurality of scaling agents.
According to certain example embodiments, the scaling master may be configured to form the scaling decision by performing operations including, aggregating the received results, and making the scaling decision based on the aggregated result.
According to certain example embodiments, the scaling master may form the scaling decision by performing operations including selecting the received results for the aggregating based on a flow control operation.
The flow control may be a closed loop flow control, and the scaling decision may be based on reducing a difference between a configured desired metric and a measured metric determined from the selected results, according to certain example embodiments.
The flow control may be an open loop flow control, and the scaling decision may be further based on a configured number of reported results selected during a first predetermined time interval, with respective first predetermined time intervals being separated by a second predetermined time interval, according to certain example embodiments.
According to certain example embodiments, the aggregating may be performed in accordance with a configurable policy.
According to certain example embodiments, the connection layer may include a structured query language, and the at least one scaling and/or alerting query may be specified in the structured query language.
According to certain example embodiments, each scaling agent may be configured to compile the received at least one scaling and/or alerting query, and to execute the compiled query.
According to certain example embodiments, the scaling master may be further configured to calculate a number of processing elements to be evicted to maintain a desired load. The calculating may be based on an average load in the processing elements. The processing elements may be ranked in accordance with a weighted average load, and the processing elements to be evicted are selected in accordance with the ranking.
According to certain example embodiments, the scaling master may be configured to transmit a command to stop the executing of the at least one scalability and/or alerting query. The command may include an identifier for the at least one scaling and/or alerting query.
According to certain example embodiments, the scaling agent may be further configured to schedule the at least one scaling and/or alerting query for subsequent execution at a specified time. The executing of the scheduled at least one scaling and/or alerting query may occur subsequently at the specified time.
In certain example embodiments, a method for scaling a processing system having a plurality of communicatively coupled processing elements is provided. The scaling master receives a user-specified scaling and/or alerting query, and transmits the query to scaling agents at respective ones of the plurality of processing elements using the connection layer. Each scaling agent executes the at least one scaling and/or alerting query, and reports a result of the executing to the scaling master using the connection layer. The scaling master forms a scaling decision based on the result reported by each of the scaling agents.
According to certain example embodiments, the scaling master may transmit one or more scaling commands based on the formed scaling decision to respective ones of the plurality of scaling agents.
According to certain example embodiments, the forming a scaling decision may include aggregating the reported results, and making the scaling determination based on the aggregated result.
According to certain example embodiments, the connection layer may include a structured query language, and the at least one scaling and/or alerting query may be specified in the structured query language.
In certain example embodiments, there is provided a non-transitory computer readable storage medium tangibly storing instructions that, when executed by at least one processor of a system, perform a method for sealing a processing system as described herein. The method may includes, for instance: the scaling master receiving a user-specified scaling and/or alerting query, and transmitting the query to scaling agents at respective ones of the plurality of processing elements using the connection layer. Each scaling agent executes the at least one scaling and/or alerting query, and reports a result of the executing to the scaling master using the connection layer. The scaling master forms a scaling decision based on the result reported by each of the scaling agents.
These aspects, features, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.