Automated modeling systems implement automated modeling algorithms that are trained using large volumes of training data. Automated modeling algorithms can use modeling techniques such as logistic regression, neural networks, etc. The training data for training automated modeling algorithms can be generated by or otherwise indicate certain electronic transactions or circumstances. In a training process, this training data is analyzed by one or more computing devices of an automated modeling system. The training data is grouped into attributes that are provided as inputs to the automated modeling system. The automated modeling system can use this analysis to learn from and make predictions regarding similar electronic transactions or circumstances. For example, the automated modeling system uses the attributes to learn how to generate predictive outputs involving transactions or other circumstances similar to the attributes from the training data.
In one example, automated modeling algorithms that predict real property values use training data involving properties that differ along numerous attributes, where these differences in attributes can impact the predictive output of the automated modeling algorithm. Data-segmentation operations can be used to transform raw data into training data segments based on differences or similarities in the raw data. The segmented data is used for training the automated modeling algorithm.
The accuracy with which an automated modeling algorithm learns to make predictions of future actions can depend on how the training data is segmented prior to training the automated modeling algorithm. But certain data-segmentation operations may decrease the accuracy with which an automated modeling algorithm generates predictive outputs or otherwise simulates decision-making processes. For instance, automated modeling algorithms involving real property data could be hindered by the large number of attributes relevant to the model. When very large numbers of categorically and numerically measured attributes are presented for data segmentation, existing techniques are unable to segment these types of training data in a manner that results in precise and accurate results.