Machine learning combines techniques from statistics and artificial intelligence to create algorithms that can learn from empirical data and generalize to solve problems in various domains such as natural language processing, financial fraud detection, terrorism threat level detection, human health diagnosis and the like. In recent years, more and more raw data that can potentially be utilized for machine learning models is being collected from a large variety of sources, such as sensors of various kinds, web server logs, social media services, financial transaction records, security cameras, and the like.
Traditionally, expertise in statistics and in artificial intelligence has been a prerequisite for developing and using machine learning models. For many business analysts and even for highly qualified subject matter experts, the difficulty of acquiring such expertise is sometimes too high a barrier to be able to take full advantage of the large amounts of data potentially available to make improved business predictions and decisions. Furthermore, many machine learning techniques can be computationally intensive, and in at least some cases it can be hard to predict exactly how much computing power may be required for various phases of the techniques. Given such unpredictability, it may not always be advisable or viable for business organizations to build out their own machine learning computational facilities.
The quality of the results obtained from machine learning algorithms may depend on how well the empirical data used for training the models captures key relationships among different variables represented in the data, and on how effectively and efficiently these relationships can be identified. Depending on the nature of the problem that is to be solved using machine learning, very large data sets may have to be analyzed in order to be able to make accurate predictions, especially predictions of relatively infrequent but significant events. For example, in financial fraud detection applications, where the number of fraudulent transactions is typically a very small fraction of the total number of transactions, identifying factors that can be used to label a transaction as fraudulent may potentially require analysis of millions of transaction records, each representing dozens or even hundreds of variables. Constraints on raw input data set size, cleansing or normalizing large numbers of potentially incomplete or error-containing records, and/or on the ability to extract representative subsets of the raw data also represent barriers that are not easy to overcome for many potential beneficiaries of machine learning techniques. For many machine learning problems, transformations may have to be applied on various input data variables before the data can be used effectively to train models. In some traditional machine learning environments, the mechanisms available to apply such transformations may be less than optimal—e.g., similar transformations may sometimes have to be applied one by one to many different variables of a data set, potentially requiring a lot of tedious and error-prone work.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.