The present disclosure relates generally to automated data analysis, and more specifically, to an automated intelligent data navigation and prediction tool for automated training of models.
A task of a data scientist may be to select a “best” or most suitable model (i.e. learning or other analytic algorithm) to apply to a given data set. However, determining a best model for a given data set may be daunting, since the number of analytic algorithms available to data scientists is quite large (e.g., when counting available analytic algorithms across platforms there are easily hundreds). Further, the amount of data in typical modern data sets is also quite large. Given the large number of available analytic algorithms and the size of a given data set, it may be infeasible to apply all models to the full data set in a reasonable amount of time and expense. Data scientists must instead focus their effort on only the most promising models.
For example, training a single algorithm on a data set with one million samples is a time consuming process that can take days. In turn, when testing multiple analytic algorithms on this same one million data point data set, selecting a “best” or most suitable model can simply not be performed within a short time.