The present invention, in some embodiments thereof, relates to training ensembles of randomized decision trees systems and methods and, more specifically, but not exclusively, training ensembles of randomized decision trees using multiple large training data sets over distributed streaming processing nodes, for example graphic processing units (GPU), multi-core central processing units (CPU) and processing nodes clusters.
Randomized decision tree is a tree like model used for evaluating decisions and predictions and their possible consequences and outcomes. Randomized decision trees are used in various fields, for example financial applications, medical data analysis and gaming applications.
To achieve best results in decision making, randomized decision trees need to be trained and built to employ the best decision (in a statistical sense) and classification path for input data based on data attributes. The randomized decision tree will be better trained as multiple large data sets are driven into it during the training process.
Randomized decision trees training techniques are available in a plurality of implementation methods, for example depth first, breadth first or hybrid depth first/breadth first. Depth first means a branch is built through all its nodes all the way to its leafs and then moving on to the next branch while breadth first means building nodes at a tree level across all branches and moving down one level at a time.