1. Field of the Invention
This invention relates to machine learning, and particularly to systems, methods and computer program products for supervised dimensionality reduction with mixed-type features and labels.
2. Description of Background
A common problem in many applications of statistical data analysis is to learn an accurate predictive model from very high-dimensional data. Examples include predicting network latency and/or bandwidth between two points based on observed latency and/or bandwidth between some other pairs of points; predicting end-to-end connectivity in a wireless or sensor network, and in general, predicting end-to-end performance of a transaction in a distributed system given some other measurements such as, for example, observed end-to-end performance of other transactions. Particularly, in systems management/autonomic computing applications that require self-healing capabilities there is a need for fast, online predictions from high-dimensional data volumes, e.g. for the purpose of selection best route in overlay networks and sensor networks, or selecting best server to download a file from in content-distribution systems. The problem here would be to predict quickly and accurately the latency or bandwidth for a particular end-to-end connection, given high-dimensional data recording previous end-to-end performance for a large number of end-to-end connections, such as previous file download history in the network, or previous connectivity.
There are multiple other examples of learning from very high-dimensional data, including but not limited to applications such as customer response prediction in online advertisement, predicting presence of a disease based on DNA microarray data, predicting person's emotional state based on her/his fMRI data, and so on. However, learning from very high-dimensional data presents several challenges including computational burden and overfitting the data. Also, one may be interested not only in learning a ‘black-box’ predictor from high-dimensional data, but also in identifying predictive structures in the data, i.e., building an interpretable predictive model.
A common approach to handling high-dimensional data is to use some dimensionality reduction technique before learning a predictor (classification or regression model), i.e. to transform the original high-dimensional data represented by an N×D matrix X (where N is the number of samples, and D is the number of input variables called features, i.e. the dimensionality of the input) into a low-dimensional space, where the coordinate axis in the low-dimensional space correspond to so-called hidden variables. Then a straightforward approach would be just to learn a predictor on top of the small-dimensional representation, given the labels Y (an N-dimensional vector in case of a single prediction problem, or an N×M matrix in case of solving M prediction problems simultaneously, i.e. having M class labels to predict). More sophisticated state-of-art approaches in this area, called supervised dimensionality reduction, tend to combine learning a predictor with learning a mapping to a low-dimensional space. Performing simultaneous dimensionality reduction and learning a predictor can lead to better results than performing those steps separately. This approach is usually referred to as supervised dimensionality reduction (SDR). FIG. 1 illustrates hidden-variable model 10 for SDR where Xi denote observed variables, Y1 . . . YM denote the class labels for M prediction problems, and Ui denote hidden variables. A hidden-variable model for SDR can be depicted graphically as shown in the FIG. 1, where Xi denote observed variables, Y1 . . . YM denote the class labels for M prediction problems, and Ui denote hidden variables. State-of-art SDR approaches make different assumptions about the nature of relationship between the vector X and vector U, which defines dimensionality reduction part, and relationship between U and Y, which defines the prediction part.