The availability of multiple data streams allows the use of multimodal biomarkers to improve the performance of disease prognosis and diagnosis predictors. Discriminative features may be identified from multiple feature views acquired from different modalities. For example, patients likely to suffer prostate cancer (CaP) biochemical recurrence (BcR) may be identified using features acquired from a histology feature view and a proteomics feature view. Similarly, CaP grades may be predicted using features acquired from T2 weighted (T2w) magnetic resonance imaging (MRI) and dynamic contrast enhanced (DCE) MRI.
Multiple feature views increase the number of features from which discriminative features are selected. Selecting a useful set of discriminative features is a challenging problem. Some conventional methods of identifying discriminative features have employed canonical correlation analysis (CCA), while other conventional methods have used supervised multi-view (SMV) CCA. CCA addresses the problem of fusing features acquired from multiple modalities by finding a correlated metaspace that maximizes the signal, which is likely to be common to data from multiple feature views (e.g. modalities), while minimizing noise, which is more likely to be modality-specific.
Conventional methods that employ SMVCCA combine the principles of CCA with linear discriminant analysis (LDA) to find a subspace that maximizes the multi-view signal. SMVCCA also attempts to ensure the discriminability of provided class labels. While SMVCCA improves on CCA, both conventional methods are sub-optimal in practice when employed to select a useful set of discriminative features. Conventional correlation-based methods do not guarantee positive correlations of the selected features and often need a pre-feature selection step to reduce redundant features on each feature view.
Conventional SMVCCA is limited by latent components in the metaspace that can be negatively correlated. Negatively correlated features are less interpretable in clinical practice, and hurt the positive dependency between data and associated class labels. SMVCCA also requires a pre-feature selection step to reduce redundant features. The pre-feature selection step increases the time needed to select features and increases the complexity of conventional systems. SMVCCA also emphasizes the correlations of all modalities while neglecting modality-specific information. In some instances, a first modality may provide modality-specific information that is more useful than shared features. Conventional methods may ignore such useful modality-specific information due to a bias towards the modality with the greater number of features.
Conventional CCA methods have been modified to use a sparse non-negative approach. However, sparse non-negative CCA frameworks can only calculate the projection of a single feature view at a time. Sparse non-negative CCA frameworks thus have no group sparsity, and are difficult to extend to multiple feature views. Conventional methods for selecting a set of discriminative features also either neglect view information or address only class separability with a group lasso, as described in Ye, J, Liu, J: Sparse methods for biomedical data, SIGKDD 14(1) (2012) 4-15.