Embodiments of the invention are directed to techniques which may be used as part of a predictive modeling analysis. More specifically, embodiments of the invention provide methods and systems for determining a node in a classification and regression tree (CART) to use for a predictive modeling analysis.
In large scale computing deployments, one common resiliency problem is solving what is referred to as “soft failures,” where a computing system does not crash, but simply stops working correctly or slows down to a point of being effectively non-functional. Predictive analysis is a technique used to identify when a current set of sampled metrics for a computing system indicates that a future event is likely to occur (e.g., to predict when a soft failure is likely to occur). For example, predictive analysis may be used to evaluate changes over time in an amount of shared memory available to multiple virtual machine instances on a server to identify when a memory usage trend indicates that an out-of-memory condition is developing on that server. Predictive analysis relies on historical data and mathematical modeling to predict expected future activity based on the historical data and mathematical modeling of a system in order to determine when trends indicate a system problem (or other modeled event) could occur.
One approach for predictive analysis tools (i.e., a software application configured to perform a predictive failure analysis) is to use classification and regression trees (CART) to identify and isolate trends in a set of data. As is known, a CART tree splits a set of data into a series of nodes, which become more and more granular farther down the CART tree. Each branch of the tree ends in a terminal (or leaf) node when no further gain is achieved by splitting the data into nodes with fewer elements of data. Classification and regression trees are an established technology.
However, when using CART regression trees for predictive modeling, a predictive analysis tool needs to determine how far to traverse down the CART tree in order to identify a node at which to base a prediction. Frequently, traversing a CART branch all the way to the terminal node results in a set of data that is too small to use for generating a predictions, i.e., for performing a predictive failure analysis.