One of the common characteristics of many modern datasets is high dimensionality along with low signal-to-noise ratio due to a potentially large number of irrelevant variables. Quantifying data-quality issues using statistical data quality metrics such as missing rate, cardinality, etc. is the first task in predictive modelling of a dataset. As a result, variable (feature) transformation aimed at increasing model performance is a significant part of a predictive modelling workflow. However, high dimensionality precludes an interactive variable-by-variable analysis and transformation. To handle this issue of scale (high dimensionality), practitioners consider data quality issues iteratively. For example, variables with a high-rate of missing values can be identified and addressed. Variables with a high-skew can then be identified and addressed. However, this approach precludes the effective utilization of prescriptions that can treat multiple data quality problems at the same time. In addition, this approach is prone to significant bias, especially in cases where imputation is applied to variables with high missing rate. Automated data preprocessing with meta-learning machine learning systems is another potential solution to the scale issue. However, current meta-learning systems use dataset features that are based solely on individual data quality metrics, and do not take interactions between data quality metrics into consideration. This approach finds it challenging to retain sufficient information that describes the dataset, which is a critical step for meta-learning based approaches.