Scalability and quality of service can be difficult to predict, preplan and technically build into a relatively large system, such as an enterprise platform. Design-to-cost, where a system may be pre-designed according a user's specifications, is not generally effective for scalability, as scalability is generally a technical guess for future requirements and cost prohibitive to anticipatorily design into a system.
For example, all resources in a system are limited, technical control over resource usage is even more limited, and work days may impose an inhomogeneous payload onto the system that is difficult to model and to solve technically in advance. Thus, there is generally an inherent contradiction between a preplanned technical scalability solution (and thus the predicted product's scalability) and the dynamic and unpredictable system behavior in real time, that cannot be adequately anticipated with conventional solutions. This can result in a conflict between different scalability demands and quality of service demands at run-time on one hand, and those scalability demands that were already built into the system on the other hand. Such conflict may result in resource cannibalism within the system, management difficulties for a human administrator, delays for the user, etc.
For instance, a system implemented in a medical center suffering from the above issues could result in potentially dangerous situations that would risk a patient's life.
When the system is in use, multiple components may be active that are not aware of each other's actions. For example, the resource limitation of computer systems and the downgrade of uptime qualities may occur whenever resources become too low or are not available. Resources on such a computer system are generally related to hardware and software. Reasons for such shortages may be the limits of hardware scalability, the number of processes and users, the amount of data to be processed, etc.
Conventional distributed, scalable systems may change run-time structures available and thereby, for example, maximize the uptime qualities of the system, such as availability and performance. However, the number of possible parameter variations of bottleneck situations in such conventional systems is generally too high for complete analysis, preplanning, testing, and human handling in the field.
Conventional systems have scalability solutions built in during the test-phase of a system, where a number of scalability requirements for users, processes, hardware, use cases are tried out. However, this trial and error approach is by nature an incomplete and limited analysis of run-time situations the system will encounter when deployed. Further, during the test-phase, it is assumed that the product will always meet payloads equivalent to those during the test. However, during run-time, the payload may exceed the product's capability.
Conventional scalability solutions are only based on select static parameters, such as by statically matching users and graphic cards, processes and CPUs, 3D image volume sizes and main memory.
Conventional scalability solutions are non-invasive in terms of a target system. For example, such solutions often assume homogenous payload, and treat the target system as a black box. Thus, conventional scalability solutions do not cooperate with the target system.
For example, the state of multiple tasks or processes in a conventional system may be aggregated from a plurality of monitors in the system. For example, a decision tree using one of green, yellow, red colors to indicate the status of individuals processes or tasks may be displayed, where the display is monitored by the human administrators. Accordingly, all actions have to depend on human consideration and decision, while watching the color scheme. Moreover, if the human administrator is not present to monitor the display, no action will take place regardless of color scheme. Monitoring rules may be established to take action according to, for example, the log message traffic and/or monitoring state. However, these actions are generic in nature and do not follow a strategy, for example, that takes into account business responsibilities or system capabilities. For example, in the conventional system, a monitoring rule may simply kill or disable a process based on a given state of the process, without taking into account states of the other processes. Moreover, all decisions and actions taken by the conventional system are of a local component scope, where a local component may be an application being used by a user. This is because automating actions in a conventional system beyond the local component scope would be considered dangerous and unreliable.
Conventional systems try to overcome the limitations of an automated local scope by using load balancers, which judge the overall state of a farm of machines and assign requests for program starts or data processing to one of the machines using, for example, the CPU load value and other parameters. For instance, the load balancers may schedule a next user request for a new program to start with a different machine. This decision is taken at system uptime. However, load balancers do not actively improve the state of the monitored machines using a multi-parameter proactive approach. Instead, load balancers assume that their load calculation model is correct, even if the model fails and overloads the machine. The consequences of the load calculation model, such as weak or good performance, are not predictable, but still they are taken as correct and valid input for the next scheduling step in conventional systems.
Conventional systems may also attempt garbage collection, which is a form a memory reallocation, at the local level to overcome the scalability limitations of an automated local scope. However, garbage collection can degrade the performance, and therefore the scalability, of the use cases temporarily and dramatically.
While conventional systems may be scalable, conventional systems are limited in their degree of scalability and not designed to improve scalability. For example, conventional systems may generally only operate within a local scope of actions or only minimally take into account a global scope when operating, where a global scope may incorporate the actions of multiples processes or components.
Therefore, conventional solutions are inflexible and obey each configured threshold individually. The conventional solutions do not judge each threshold in view of the entire set of thresholds even if the set of thresholds are related to each other. The standard mitigation for faulty threshold values in conventional systems is for the human administrator to manually repair the problem, where the same types of manual repairs may be carried out repeatedly because conventional solutions can not determine whether the current thresholds are correct or inadequate.