A system may spend significant resources on performing actions to achieve a positive result. Such actions are often performed on entities, and the achievement of a positive result often depends on characteristics of those entities. For example, a software update system may spend significant resource on performing a particular update on computing nodes in a data center. Whether an update of a particular computing node is successful depends largely on the characteristics of the computing node, the entity on which the action was performed.
In a system with limited resources and a high likelihood of a negative result, it is critical to minimize the number of actions and, thus, the resources spent on those actions. One way to minimize the number of actions is to accurately predict the result for of the action on entities before performing the action. The prediction of such a result is usually associated with a probability of describing the certainty of attaining the result. The system may then choose to perform actions on some entities for which the result is more likely to be attained, while ignoring other entities for which the result is less likely to be attained. Accordingly, the system has greater efficiency to achieve positive results than by randomly selecting entities on which to perform actions.
However, predicting results for entities is computationally intensive, especially when there is a large number of entities. Each entity, for which the result is unknown, needs to be compared with another entity, for which the result is known. With a large number of entities with unknown results, the permutations of comparisons may grow exponentially. Scaling such computations to millions of entities is significantly strenuous for a computer system performing the determination of probabilities.
Nevertheless, even a significantly strenuous computation may be worth the consumption of computing resources, if the accuracy of the prediction result itself will save significantly more resources. For example, when a new software update is to be deployed to a datacenter containing a large number of heterogeneous computing nodes, the resources would be efficiently spent on deploying the update on a small number of computing nodes and then, deploying the update on those computing nodes of data center that are predicted to have positive results, as long as the predicted results are accurate. An alternative approach of deploying the software update on all computing nodes in the datacenter (perhaps in 1,000's or 10,000's) and monitoring all of those computing nodes performance is very resource intensive. Particularly because the monitoring may uncover that a significant number of nodes are underperforming after the deployment, and the update deployed on the underperforming nodes may need to be rolled back. Such rollbacks will consume significant computational resources and cause major disruption for workloads executing in the datacenter.
Another example for which an accurate prediction of results may save critical resources is the allocation of sales resources to accounts in a business organization. The success of a business organization depends largely on the effectiveness of the organization's sales team. At least one aspect that affects the overall effectiveness of a sales team is the manner in which sales resources—that is, the individual members of the sales team (commonly referred to as sales representatives)—are allocated or assigned to the various customer accounts of the business organization. An accurate prediction of a positive (or negative) result for customer accounts allows the sales resources to be effectively targeted to the appropriate customer accounts and increase the likelihood of becoming a seller of products for the customer represented by the customer account. If the number of customer accounts is several degrees larger than the available sales resources, then the business organization is unable to target each and every account and, even if such methodology is attempted, much of the sales resources may be spent without any tangible result. Accordingly, an accurate prediction of a positive result for a subset of customer accounts based on comparisons of the accounts with unknown results with the known ones would increase the effectiveness of the sales resources.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.